Open gjaekel opened 2 years ago
By temporarily inserting a continue 2
before L120 ...
https://github.com/AlexanderMelde/dl_for_heise/blob/2e592419e5c0c9635c7e554428670f914559c92e/download.sh#L117-L121
... I made a POC-version that just will trigger the backend to prepare the documents.
I run this to prepare five issues. I start the unmodified version immediately afterwards, but this was to fast and it seems that I stepped into an DOS protection by to many requests at a time because I got an HTTP RC 500
. But after about half an hour, I try it again and this run downloads the five issues one by on without any delay.
Another approach: I just set max_tries_per_download=1
(L.9) and run the script multiple times.
Maybe a good concept is just to "pull out" the retries for the download from the innermost to an outermost loop.
Hi Guido, you're welcome! Thank you for documenting your experiments. Indeed i followed a very similar approach and just ran the scripts twice, the first run with no repetitions or wait times to just "trigger" the serverside pdf generation, and a second run to finally download them. That worked well and especially fast for most PDFs, however it wasn't really reliable (e.g. due to the DDOS protection). For this script, i decided to keep it "safe, but slow" - with high number of repetitions and long wait times, to ensure everything is downloaded, e.g. over night. If you however want to get a quicker run, with the downside of manual monitoring of the progress, everyone should feel free to adapt the parameters (as we two did :) ). Maybe we could introduce some kind of config files, or example sets of parameters to include in the README file 😄
Dear Alexander, fist, a big thank you to you and the other contributors mentioned in the README.
To speed up archiving a whole year I ask myself if it would be possible to trigger the generation of (all or some number of) the documents in a first step. And to start the download of all in a second step.
This may avoid to busy-wait for the generation time at every document.
I'll start a POC about this, now