4pr0n / ripme

Downloads albums in bulk
MIT License
915 stars 205 forks source link

Sometimes photos aren't renamed after download completes #37

Closed metaprime closed 10 years ago

metaprime commented 10 years ago

Most of the album will be correctly downloaded, but a couple of images are named like 013_image.php (which seems an odd choice for a default extension for downloads anyway). Changing extension to .jpg shows that the picture was properly downloaded, just not renamed.

Possibly a thread locking or concurrency issue?

I'm on Windows 7 btw

4pr0n commented 10 years ago

This seems ripper-specific (not all sites have .php files).

Can you provide links to albums that sometimes rip .php files? Or that have ripped .php files in the past? Which ripper was it?

metaprime commented 10 years ago

Sadly it was the E-Hentai ripper I just implemented, but I never wrote anything about .php files in it, so I'm not even sure how that's happening unless it's coming from the code which calls it.

metaprime commented 10 years ago

Usually it's just one or two of them for every couple dozen photos downloaded. It seems non-deterministic like a race condition or something. One album that was downloaded particularly badly (about 25% of files weren't named properly) was: http://g.e-hentai.org/g/288442/34e34ff52d/

Here's the directory listing post-download.

000_09.12.05A.jpg 025_image.php 001_DSC05023.jpg 026_DSC05143.jpg 002_DSC05026.jpg 027_DSC05146.jpg 003_DSC05029.jpg 028_DSC05148.jpg 004_DSC05035.jpg 029_DSC05153.jpg 005_DSC05039.jpg 030_DSC05155.jpg 006_DSC05041.jpg 031_image.php 007_DSC05043.jpg 032_DSC05158.jpg 008_image.php 033_DSC05163.jpg 009_DSC05046.jpg 034_DSC05166.jpg 010_DSC05052.jpg 035_DSC05167.jpg 011_DSC05059.jpg 036_image.php 012_DSC05061.jpg 037_DSC05170.jpg 013_DSC05063.jpg 038_DSC05176.jpg 014_DSC05066.jpg 039_DSC05194.jpg 015_DSC05087.jpg 040_image.php 016_image.php 041_DSC05206.jpg 017_DSC05097.jpg 042_DSC05208.jpg 018_DSC05100.jpg 043_image.php 019_image.php 044_DSC05218.jpg 020_DSC05118.jpg 045_DSC05222.jpg 021_DSC05121.jpg 046_image.php 022_DSC05124.jpg 047_DSC05230.jpg 023_image.php 048_image.php 024_image.php [LiTU100] ?????? Lan Yi #1 (2010.01.18) - E-Hentai Galleries.url

All of the files have the correct ordinal prefix, which is being set inside of the ripper class as far as I can tell, but anything which is .php comes out as "%03d_image.php"

metaprime commented 10 years ago

By the way those .php files are actually jpegs. Renaming them to file extension .jpg works correctly and everything is properly downloaded.

metaprime commented 10 years ago

Here's another album that downloaded with a few files names incorrectly.

http://g.e-hentai.org/g/293294/af8f042c87/

It may simply be a race condition that is somehow not code-specific but rather site-specific? I have no idea, I'm just shooting in the dark.

I know that running rename *.php *.jpg following the download fixes all the files in a hurry, so maybe something similar could be done here, following the download?

4pr0n commented 10 years ago

The overloaded addURLToDownload() method will try to guess the filename when given a URL & prefix: https://github.com/4pr0n/ripme/blob/master/src/main/java/com/rarchives/ripme/ripper/AbstractRipper.java#L94-L110

Where it's invoked from Ehentai: https://github.com/4pr0n/ripme/commit/f9e8dd33060167a10c572495dc9d0ec8886d06ad#diff-3ed19e2df23533ecd444859cc81df81bR96