danburzo / percollate

A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs.
https://danburzo.ro/projects/percollate/
MIT License
4.32k stars 166 forks source link

Add option for a delay between request #133

Closed XiangRongLin closed 2 years ago

XiangRongLin commented 2 years ago

Feature description

In the scenario that multiple urls are passed in, I want to be able to specify a delay between the request to the website.

My usecase would be downloading all chapters from a table of contents, where I image that I would quickly get blocked if hundreds of requests are sent as fast as possible.

Existing workarounds

Is there any way to obtain the desired effect with the current functionality?

Not that I know of, because I want the output to be combined into a single epub file.

danburzo commented 2 years ago

Hi @XiangRongLin, thank you for the report. In general, I've avoided implementing options for fetching pages, since that opens up a whole new dimension of configuration (do we support delays / parallelism? proxies? authentication headers? etc.). Instead you are able to use a combination of - and --url to offload the responsibility to a separate program (eg. curl) as below:

curl https://example.com | percollate pdf - --url=https://example.com

For bundling multiple pages into a single EPUB, the workaround is admittedly a bit convoluted:

  1. fetch each page using curl and feed it to percollate html with the - operand and the --url option, using your desired parallelism and delay between requests.
  2. feed all local HTML pages to percollate epub.

It might make sense to introduce an option to control parallelism and delay, such as:

percollate epub --wait=N url1 url2 ...

When --wait is supplied, percollate could switch from fetching in parallel to fetching sequentially, with a delay of N seconds between requests.

danburzo commented 2 years ago

The --wait option has been published in percollate@2.2.0.