alexgand / springer_free_books

Python script to download all Springer books released for free during the 2020 COVID-19 quarantine
GNU General Public License v3.0
1.64k stars 366 forks source link

Check file after downloading and redownload if corrupt? #19

Open jaintj95 opened 4 years ago

jaintj95 commented 4 years ago

I used the script to download 14GB worth of files and more than 60% of them turned to be corrupt files to due to incomplete downloads.
Would be great if somehow we could check that the downloaded file is not corrupt.
If corrupt: reinitiate download.

emmaKts commented 4 years ago

I tried this out of curiosity and that happened to me as well, most of the files were corrupted. I have created my own script anyway, but it's worth mentioning as a lot of people are using it.

Thanks for this though @alexgand, much appreciated!

VikashKothary commented 4 years ago

@emmaKts How does your script differ from this one as such to prevent the corruption issue?

alexgand commented 4 years ago

Here all downloads were ok, perhaps the issue is related with the quality of the connection.

I'll leave the issue open, if anyone know how to do this (check if the file is corrupted and restart the download), feel free to do a pull request!

pjungermann commented 4 years ago

@jaintj95 it might be that a lot of those were ePub files which were actually PDF files (as no ePub existed). There is a fix since PR #25. Maybe that already helps. (I'd check whether it is part of your local clone; see also PR #26).

A check for PDF files could be done by using e.g. PyPDF2 and ePub files could get checked using zipfile (as it is a zip at the end). This would make it slower, of course. And there is the question on how often you want to try again and how to react if those tries are all used.