Closed yoshiyoshyosh closed 4 months ago
I think the current downloaded.txt
format is actually really silly, we should really be storing the downloaded albums by their ids, not the URLs (as you mentioned, custom domains may stop working, in addition, if someone changes their subdomain of bandcamp.com it will also stop working). I think we should change it to something like downloaded.csv
with a comma separated list of album ids. For migration, we can check to see if downloaded.txt
exists, and if it does, create the new file based on the old one, and delete the old one (however, I'm not sure if there is an efficient way to get album ids based on URLs... it may take a while. Maybe we could do some kind of 'lazy migration' where it only migrates n
urls on each launch, or add a migration script and issue a warning if downloaded.txt
still exists).
Similarly, the mail_album_data
dict should use tralbum ids as keys instead of the url...
A straightforward way to get the track/album id is to look at the meta tag bc-page-properties
on the track/album page. An example for the album you linked:
<meta name="bc-page-properties" content="{'item_type':'a','item_id':1020132016,'tralbum_page_version':0}">
We can use a combination of the item_type
(t for track and a for album) and item_id
(just concatenate them) as our item id, since I'm not sure if item_id
s are unique regardless of item_type
.
However it would be nice if we can get these values without having to load the whole webpage.
Should be fixed in v0.2.1, I realized we can actually just keep the current downloaded.txt
file but add album IDs instead of URLs from now on and check both URL and ID to see if it's in the file already
so, bandcamp allows people to have a custom domain that redirects to their bandcamp. it's not just a simple 301 redirect from the domain to their site and everything else is the same. it's mostly like that, except for the fact that artwork link in album download pages link to the custom domain, not the
*.bandcamp.com
domain.to see how this affects the script, consider this album: https://generalmumble.bandcamp.com/album/blimp-fortress despite providing the
*.bandcamp.com
link, thealbum_url
that gets stored indownloaded.txt
ishttps://mumbleetc.com/*
rather than the bandcamp link. if, in the future, the custom domain stops working / gets disabled, this will cause the album to get re-downloaded and a "stale link" to be left in thedownloaded.txt
file, which could result in something especially bad if a different bandcamp account snags the same custom domain (while unlikely, it is possible)for albums that require email, it just causes the script to crash if you provide it the
*.bandcamp.com
link, which is what one would usually provide since it's what is redirected to automatically:there's a few ways this can be resolved, which is why I made an issue to discuss rather than immediately starting with a pr:
*.bandcamp.com
domain no matter if you give it a custom domain or not, and on album download pages, trade out the custom domain for said*.bandcamp.com
domain before doing anything with itdownloaded.txt
as well as future-proof the script in case a custom domain stops working. however, it also would cause current links indownloaded.txt
to be stale and have redundancy, which I guess isn't too bad if it means keeping stuff for the future*.bandcamp.com
and custom domains indownloaded.txt
, and somehow resolve issues like that*.bandcamp.com
link. the problem with this approach is that while it's easy to get the*.bandcamp.com
domain from a custom domain, the only place I can see to get the custom domain from the bandcamp one is on the download page, so it's probably cause some weird stuff to fixeven if I like the first approach the most, in any case, I'd like to hear your input