alexgand / springer_free_books

Python script to download all Springer books released for free during the 2020 COVID-19 quarantine
GNU General Public License v3.0
1.64k stars 366 forks source link

Smaller docker base image #66

Closed lecafard closed 4 years ago

lecafard commented 4 years ago

I've changed the docker image to slim as the other image was quite big. It should function the same though. Also added multithreading to download many books at the same time.

chaosAD commented 4 years ago

Overall, I think the idea of multithreading support is great. However, there are a couple of problems:

  1. Ethical issue: I am not sure if it would be good to swarm Springer with concurrent downloads. Having known that they are so generous to give away free e-books, I think we should do our part not to cause them trouble (this is just my personal opinion).

  2. I did the test and the cumulative download peaked at around 45Mbps (23Mbps on average). However, springer host forcibly closed some connections halfway. I also noticed some files were corrupted (or incomplete?); maybe the host just ended the transmissions by sending a FIN packet (this would be taken as download-complete on our end, so no error generated).

  3. Temporarily file naming conflict happened at least once, so some download sessions were dropped. This can be fixed, however.

@lecafard great work nonetheless.

alexgand commented 4 years ago

@lecafard @chaosAD I think it is not worth the effort to make the download multithreading, keep things as simple as possible.

lecafard commented 4 years ago

Makes sense, i reverted the multithreading changes. Could you take a look at the changes regarding the smaller docker base image?