4pr0n / ripme

Downloads albums in bulk
MIT License
914 stars 205 forks source link

e-hentai ripper unnecessary delays #177

Open metaprime opened 9 years ago

metaprime commented 9 years ago

The ripper has delays to prevent the website from blocking downloads on a timeout. However, when re-ripping an album, images which have already been downloaded should not trigger this delay. For re-ripping a couple of images which were perhaps deleted by mistake or corrupted, or to resume a stopped rip, this will help a lot in speeding up the productive part of the download.

4pr0n commented 9 years ago

I think the e-hentai ripper is one of those rippers that needs to download each image from a specific page.

  1. Fetch the main gallery (e.g. gallery.html)
    • We have a list of "images" in the gallery (but these are just links to more .html pages)
  2. Fetch first "image page" at image1.html
    • Now we can extract the actual image (*.jpg) from the page
  3. Download image (or skip if it's a duplicate)
  4. Repeat Step 2 for next image.

The ripper does not know if it's a duplicate until Step 3, when we have the image URL & are about to download it. At that point, we're already fetching the image's HTML page and likely need to throttle those requests.

But I think there could be extra logic to detect duplicates before Step 2.