Ezwen / bandcamp-collection-downloader

A command-line tool to automatically download all releases purchased with a Bandcamp account. The official page of the project is https://framagit.org/Ezwen/bandcamp-collection-downloader, while here this is just a mirror hosted on Github.
GNU Affero General Public License v3.0
247 stars 25 forks source link

org.zeroturnaround.zip.ZipException: java.util.zip.ZipException: zip file is empty & org.jsoup.HttpStatusException: HTTP error fetching URL #20

Open federicocandiago opened 3 years ago

federicocandiago commented 3 years ago

Hello Ezwen and thanks for the amazing application, after being able to download part of my Bandcamp collection over a few days with the latest bandcamp-collection-downloader.jar you provided on GitLab, every time I try to resume i face this errors, with no file being dowloaded:

Connecting to Bandcamp…
Found "____'s collection | Bandcamp" with 1253 items.

[pool-1-thread-4] Error while trying: "org.zeroturnaround.zip.ZipException: java.util.zip.ZipException: zip file is empty".
...
[pool-1-thread-4] Error while trying: "org.jsoup.HttpStatusException: HTTP error fetching URL".

my command is always the same: oracleJava -jar bandcamp-collection-downloar.jar -f mp3-320 profileName -d /Volumes/Esterna/Bandcamp3 -c cookies.json (oracleJava is an alias)

What I tried:

My system:

Could I ask for any help on how to be able to finish downloading my library? Thanks in advance

Ezwen commented 3 years ago

Hi there! Thanks for the report. That's a mysterious one. If possible, could you try manually downloading the release using your favorite browser, and check whether (1) the download goes well, and (2) the zip file is OK?

federicocandiago commented 3 years ago

Hi! thanks for the quick reply.

I tried to re-download the executable and re-create the cookies.json file with the minimum security settings on Firefox 86, which correctly logged me into my profile from the browser, but the problem with bandcamp-collection-downloader persisted.

I also tried to create an Ubuntu 18.04 AWS EC2 (Intel*) machine, installed default-jre (not the Oracle Java I was using), and launched a freshly downloaded executable with java -jar bandcamp-collection-downloader.jar -f mp3-320 profileName -d ~/Downloads/Bandcamp -c cookies.json , the bandcamp-collection-downloader.jar file being downloaded with wget from the v2020-12-15 release link and renamed, and cookies.json being the freshly created cookies file, containing 9 Bandcamp cookies and a Recaptcha one. Unfortunately, the same error messages came out.

I really hope it's not Bandcamp blacklisting us, at this point.

Ezwen commented 3 years ago

Thank you for your experiments! … However, it is not exactly what I was looking for. My apologies, maybe I was not very clear with what I was asking :)

Here is it rephrased: without using bandcamp-collection-downloader at all, could you manually check that the official download page of the problematic release works properly, and gives you a fully working zip?

I really hope it's not Bandcamp blacklisting us, at this point.

I really don't think they do, we would get much more errors :)

federicocandiago commented 3 years ago

Lol sorry for the misunderstanding, now that's very clear 😃

Trying to manually download the affected releases, a "download expired" message appears for some albums, while for other albums the .zip file is downloaded correctly (I already contacted Bandcamp's assistance for that).

To be assured that was not the problem, I deleted the bandcamp-collection-downloader.cache file and tried to redownload my whole library, but all of that failed with the same error, and a new bandcamp-collection-downloader.cache file has not been written again. this is the end of my log:

[pool-1-thread-1] Could not download item: No URL found for item (maybe the release has no digital item, or the provided download format is invalid)
[pool-1-thread-3] Found release "永遠に真夜中" (2018) by Zadig The Jasp (Bandcamp ID: r85484762).
[pool-1-thread-3] Could not download item: No URL found for item (maybe the release has no digital item, or the provided download format is invalid)
[pool-1-thread-4] Found release "プラネットネオ東京" (2018) by Ohm-N-I (Bandcamp ID: r85484761).
[pool-1-thread-4] Could not download item: No URL found for item (maybe the release has no digital item, or the provided download format is invalid)
[pool-1-thread-2] Found release "KPD Revision" (2017) by Sonnig 991 (Bandcamp ID: r85484760).
[pool-1-thread-2] Could not download item: No URL found for item (maybe the release has no digital item, or the provided download format is invalid)
192:Downloads federico$ cd /Volumes/Esterna/Bandcamp
192:Bandcamp federico$ ls -lah
total 16
drwxr-xr-x   7 federico  staff   238B 26 Feb 14:30 .
drwxrwxr-x  40 federico  staff   1,4K 25 Feb 19:40 ..
-rw-r--r--@  1 federico  staff   6,0K 26 Feb 12:58 .DS_Store
drwxr-xr-x   3 federico  staff   102B 26 Feb 13:01 Hunter
drwxr-xr-x   3 federico  staff   102B 26 Feb 13:01 Various Artists
drwxr-xr-x   3 federico  staff   102B 26 Feb 13:01 Vercetti
drwxr-xr-x   3 federico  staff   102B 26 Feb 13:01 desert sand feels warm at night
192:Bandcamp federico$ cat bandcamp-collection-downloader.cache
cat: bandcamp-collection-downloader.cache: No such file or directory
192:Bandcamp federico$
sixty4k commented 3 years ago

I'm seeing the same thing, but if I wait a short while and try again it works for albums it failed for previously. I suspect there's some rate limiting happening, but don't know if I'll have time for poking at it anytime soon.

sixty4k commented 3 years ago

after rerunning a few time, I'm only seeing issues with one release: https://stereophonk.bandcamp.com/track/enta-humpty no download link if I browse to the page, so maybe legitimate bandcamp issue?

sixty4k commented 3 years ago

edit: no I'm an idiot. Download links were there.

my failing download was not a zip file a single track 'album' will download as just an mp3.

federicocandiago commented 3 years ago

I've done a dozen or more tests.. and noticed something unexpected: oracleJava -jar bandcamp-collection-downloader.jar -f mp3-0 -d Bandcamp -c cookies.json myProfile doesn't work.. but following your order, oracleJava -jar bandcamp-collection-downloader.jar -c cookies.json -d Bandcamp -f mp3-v0 myProfile works correctly all the time 😄

Now my knowledge of Java is extremely limited, but I didn't really know (or expect) named parameters needed to follow an order in these cases. Thanks for the amazing job and for your assistance sixty4k and Ezwen!

tylerstraub commented 3 years ago

I'm seeing the same thing, but if I wait a short while and try again it works for albums it failed for previously. I suspect there's some rate limiting happening, but don't know if I'll have time for poking at it anytime soon.

also seeing a similar behavior here.. I waited maybe about 10 minutes and came back and tried again and it started resuming as expected.

maybe we can ask for another argument which slows down the poll rate or something if we are hitting the API too hard?

jzb commented 3 years ago

Tried it this morning and I also ran into what seems to be a rate-limiting issue. My collection is about 850 items, it worked for ~16 or so and then I got a ton of:

[pool-1-thread-1] Found release "Kai" (2017) by te' (Bandcamp ID: p146327731).
[pool-1-thread-4] Error while trying: "org.jsoup.HttpStatusException: HTTP error fetching URL".

Went over to the site in Firefox and also got an error trying to download an item. Waited a few minutes and it was fine.

I've run into similar errors just trying to grab things quickly from the browser via my purchases page or whatever. Bandcamp seems set up to throttle download attempts from the same account by any method. It'd be great to have a downloader I could just fire up and say "download one item every five minutes until complete" that didn't run into this.

tylerstraub commented 3 years ago

hey, I don't know if this is helpful for anyone.. but what worked for me was limiting the maximum thread count

--jobs=1

seems to be an API rate limiting issue idk

Ezwen commented 3 years ago

There does seem to be a rate limitation in place, and it would be indeed rather nice to have a smart way to handle that. I'll keep thinking!

octplane commented 2 years ago

I have actually the same issue, even with --jobs=1 😢

I quickly patched the app to see what's going on and the downloadUrl of https://github.com/Ezwen/bandcamp-collection-downloader/blob/master/src/main/kotlin/bandcampcollectiondownloader/core/BandcampCollectionDownloader.kt#L239 is not a zip, but a ISO Media, Apple iTunes ALAC/AAC-LC (.M4A) Audio and isSingleTrack is set to false in my case and this is wrong.

The release is "Souvenirs d'Hayu Marca VI - P0010" (0000) by FB-1 (Bandcamp ID: c461186051)". (track, album)

I have no idea why the app tries to download the track only and not the whole album (I do own the whole album) and I did not had more time to investigate further down...

octplane commented 2 years ago

(dupe of #22 ?)