easlice / bandcamp-downloader

Download your bandcamp collection using this python script.
MIT License
281 stars 34 forks source link

Fix download list for collections with hidden items, add `--include-hidden` #28

Closed cubicvoid closed 4 months ago

cubicvoid commented 7 months ago

When requesting a collection, Bandcamp returns the first page of items in item_cache.collection and the first page of hidden items in item_cache.hidden. The total number of items in the two categories (across all pages) are collection_data.item_count and hidden_data.item_count.

We might hope that there would then be len(item_cache.collection) entries in collection_data.redownload_urls, and len(item_cache.hidden) entries in hidden_data.redownload_urls. Unfortunately, hidden_data.redownload_urls doesn't exist, and instead there are len(item_cache.collection) + len(item_cache.hidden) items in collection_data.redownload_urls, combining both hidden and unhidden results.

bandcamp-downloader calculates the number of items to request by the total length of the (visible) library minus the length of redownload_urls:

    'count' : _user_info['collection_count'] - len(_user_info['download_urls']),

Because of the page of hidden items in the download urls, this means that for collections with hidden items, one page of hidden items is included in the url list, and one page of unhidden items is truncated from the end of the list.

This PR adds the ability to intentionally include hidden downloads with the --include-hidden flag. When false, it skips URLs for hidden items in the initial query, which fixes the issue where some visible items would be missing in the resulting download. When true, it adds a request to the POST endpoint for hidden items, .../hidden_items, to fetch the remaining pages (previously only .../collection_items was used).

Fixes https://github.com/easlice/bandcamp-downloader/issues/5.

cubicvoid commented 4 months ago

Thanks! I also have a followup I've been using locally that improves the performance of repeated downloads by comparing existing files against the reported size in the metadata instead of actually starting every download stream first, but I was holding off until this one got merged... I can submit that one soon, I will try syncing with the other recent changes this weekend :-)

easlice commented 4 months ago

That would be great. I tried using Header Requests at one point but Bandcamp never quite replied to those consistently, so a way to stop redoing so many requests would be wonderful.

And I promise it won't take several months to get this new one merged this time. ;)