ccloli / E-Hentai-Downloader

Download E-Hentai archive as zip file
GNU General Public License v3.0
1.87k stars 138 forks source link

[Suggestion] Workaround for archive file size limit #127

Open typhoon71 opened 5 years ago

typhoon71 commented 5 years ago

Well, a simple workaround would be adding an option to just auto-split the gallery in a number of ranges and fetch those one after another. It's what one can already do manually right now, but I'm suggesting using an automation inside the addon to do it. A kind of queue should do, or maybe a mod of how the existing one is used. The resulting files would be filename_1, filename_2, ... and since most readers get sequential files right it should not be an issue. What do you think?

ccloli commented 5 years ago

Yes, that's what it should work. But it'll meet the problems mentioned in https://github.com/ccloli/E-Hentai-Downloader/issues/57#issuecomment-303384912.

In short, how to process a downloading image or failed image if other images are suceed?

For example, if a gallery has 1000 images, and their filenames are 1.jpg 2.jpg, 3.jpg..., and for each 100 images will be saved to a zip file. In theory, part1.zip should contains 1.jpg 2.jpg 3.jpg ... 100.jpg, and part2.zip should be 101.jpg 102.jpg 103.jpg ... 200.jpg and so on.

But if one of the images is failed, or the network is too slow to download the file in time (for example, 3.jpg in part1.zip), should the entire zip file (part1.zip) need to wait for the image (3.jpg) until it's available, or just save the next image (101.jpg) to get the zip file (part1.zip) immediately and the failed one (3.jpg) will be in another zip file (maybe part2.zip or even part10.zip)?

The first solution will takes more RAM as the latter images are waiting for the failed image, and in worse if all the zip file has some images failed, they'll stuck the whole process, which makes the auto-split become useless. The second solution should be fine, but the file is not in order, and if you're using some comic tools to open the zip file directly, you'll miss some images.

BTW, Chrome has increased it's Blob size limit, so most of galleries are not needed to auto-split to smaller part?

bluefiberbread commented 5 years ago

It's pretty much impossible to download a large gallery with more than 1200 images and large file size. You'll get temporarily banned for 1/24 hours or 3/24/365 days and the only way to avoid it is to add a massive delay between each download.

On Firefox ESR 52, I can download any gallery until my RAM gives up and on the latest version of Firefox, it depends, for example, it'll eat all of my RAM if I try to download a 30 page gallery with a file size of 1GB, but if I download a 1000 page gallery with 500 MB, it'll work.

But if one of the images is failed, or the network is too slow to download the file in time (for example, 3.jpg in part1.zip), should the entire zip file (part1.zip) need to wait for the image (3.jpg) until it's available, or just save the next image (101.jpg) to get the zip file (part1.zip) immediately and the failed one (3.jpg) will be in another zip file (maybe part2.zip or even part10.zip)?

If part1.zip failed to grab one image, it shouldn't continue to part2.zip. But I honestly don't think it is too much of a big deal to implement auto split but that's just my opinion. I'm perfectly fine with 16GB of RAM and if I run out of RAM, I'll just split it with page ranges.