Sibusten / derpibooru-downloader

A downloader for imageboards running Philomena, such as Derpibooru.
MIT License
62 stars 6 forks source link

Image corruption #22

Open Ricardo50 opened 5 years ago

Ricardo50 commented 5 years ago

The program has issues where downloaded images can corrupt. When this happens, the image will be downloaded partially and the rest of it will be black. But I don't have the technical details as to why this happens.

Sometimes a redownload fixes it, but this is really inconvenient with archiving (parts of) the site.

But in some cases it corrupts the same way the second time.

Ricardo50 commented 5 years ago

2034521 and 2034987 are the image IDs that always corrupts on my PC by the way

Twi-Hard commented 5 years ago

For me, 2034521.png won't download no matter how many times/ways I try. It's not in the archive (which usually gets 100% of the images that get uploaded) image The second image you mentioned (2034987.png) downloaded fine without corruption. I even did it a few extra times and it did it correctly. I'm using the up to date SSL files (although I got the same results before updating).

Twi-Hard commented 5 years ago

Oh no! I just did a check of how many images there are per thousand IDs (including both the PNG and SVG if it's an SVG file). Here's how it used to be: image And here's how it is right now: image The problem seemed to start between 2016000 and 2017000. That would make it 22 days ago.

Sibusten commented 5 years ago

That's strange. Both the images (2034521 and 2034987) download without corruption for me.

I believe image corruption could be either from something on Derpibooru's end (I've seen some images that have corrupted full size images but not thumbnails), or from an interrupted internet connection. I could look into using the hash from the json file to verify downloaded images and retry if there is corruption.

Twi-Hard commented 5 years ago

I was thinking about suggesting that (I wrote it out too, I guess I decided to not say it). That would be great 😄 I don't get why it's not downloading certain images all though.

Sibusten commented 5 years ago

Derpibooru currently has an issue with image hashes not being updated properly, so they can't be used to verify at the moment.

Also check to make sure that if you are using an API key, you have your filter set to Everything for that account on Derpibooru.

Twi-Hard commented 5 years ago

Thanks for saying that lol. I have an account specifically for using with downloaders. Apparently, I wasn't using that account. I did hide a little bit recently so I wouldn't be spoiled.

Sibusten commented 5 years ago

Ah, good to hear you figured that out.

I'm going to keep this in mind for when they fix hashes, but in the meantime I can add another step to the download where it saves to a temporary file first, then renames to the correct filename. That should hopefully help with network corruption issues.

Ricardo50 commented 5 years ago

I double checked and the API key and everything filter was set correctly. Too bad their hashing doens't work. Are they aware?

Sibusten commented 5 years ago

Yes, they're aware of the problem but say that it's an obscure bug. I can't say when it will be fixed.

Sibusten commented 5 years ago

Released a new version, v1.4.4, which implements the temp files.

Hash checking will have to wait for a fix on Derpibooru's end.

Ricardo50 commented 5 years ago

ID 2073050 is another image that always corrupts when downloading. Does anyone know why it happens? If I manually download it, it works fine.

Sibusten commented 5 years ago

Downloads without corruption when I tried, just now. Might be an issue with downloading images that have just been uploaded. You are using v1.4.4, correct?

Ricardo50 commented 5 years ago

I forgot to mention that I'm using 1.4.4, yes, I am. It's so weird that the result was consistent even though I tried it many times. Also after the upload was older.

I even added "created_at.lt:1 minutes ago" to my download search string to avoid that but it doesn't solve it from happening every now and then

Sibusten commented 5 years ago

If you try to download it now, does it download without corruption?

Ricardo50 commented 5 years ago

Weirdly enough, yes. It does work now.

It must be the website acting strange somtimes, but when it happens I can always reproduce it and download it from the site manually just fine.

Hopefully hash checking or file size checking will be possible automatically at some point in the future.