[BUG] JSONDecodeError - Githubissues

Thanks for your bug report!

After several cdx-queries I had the following scenarios:

I ended up with about 7,000,000 snapshots with a cdx-file ~1GB which crashed the system when using it.
requests.get somehow did not get the full JSON response (1GB!) and therefore the JSON response was not in a valid format.

So for 1. solutions would be to add a limit of snapshots received by the server (would be no problem as the cdx-server supports this kind of limiting) or waybackup itself would wait for user input if the amount of snapshots exceeds 1,000,000 (e.g.).

For 2. a solution would be either to eliminate this problem by a limit (see 1.) or to convert a partial result into valid json and then use it.

These solutions will always result in an incomplete download. The only way to get around this is to set a shorter range and split the query into several smaller jobs.

Conclusion I will try to implement the best trade-off off these ideas. Meanwhile for your bug just heavily reduce the range and let waybackup run several times in smaller ranges instead.

bitdruid / python-wayback-machine-downloader

[BUG] JSONDecodeError #20