andyjsmith / SmugMug-Downloader

Download all the images from a SmugMug user
47 stars 19 forks source link

CPU bound when files are already download #5

Closed obadz closed 3 years ago

obadz commented 4 years ago

First of, thank you for this tool. It's amazing that with ~100 lines of code you put together a tool that gets the job done, with no faff, and a nice UI to boot!

The only thing I find odd is that if you restart it against a partially downloaded collection, it uses 100% CPU for a long time until it gets back to the point where it left off. At first I thought it was checking hashes but reading the code I can see that's not the case.

I'm thinking maybe get_json is slow? Since that's pretty much the only thing that happens in the inner loop.

andyjsmith commented 3 years ago

I think the issue you're having is that there isn't really any proper resuming functionality. The script just starts downloading from the beginning and if the image already exists on disk it just skips it: https://github.com/andyjsmith/SmugMug-Downloader/blob/fc445e7d2433a26a7e5b0a96e414d5298dd416a9/smdl.py#L101-L103 A proper implementation might read the files on disk or save a progress file to determine where to start off from, but right now it will just loop through API requests and file checks until it gets to an image that isn't on disk yet. You're welcome to submit a PR if you have an improvement but I don't have the time to work on this change right now.