Bionus / imgbrd-grabber

Very customizable imageboard/booru downloader with powerful filenaming features.
https://www.bionus.org/imgbrd-grabber/
Apache License 2.0
2.42k stars 212 forks source link

[Bug] Mislabeled Files On NHentai Saved With Wrong Extension #3158

Open Keyeson opened 2 months ago

Keyeson commented 2 months ago

Bug description

Some galleries on NHentai and I assume any other site that deals with translated works, have their files mislabeled with the wrong extension. Specifically they will have the extension of .png, but are encoded as JPG this causes issues with Exiftool being able to tag them as I posted in this issue #3130. When checking one of these mislabeled files with Exiftool this is what I get.

exiftool 505041-1-f5d6db40ac458d9e9383f62621ab33c3.png
ExifTool Version Number         : 12.70
File Name                       : 505041-1-f5d6db40ac458d9e9383f62621ab33c3.png
Directory                       : .
File Size                       : 602 kB
File Modification Date/Time     : 2024:04:12 01:54:18-04:00
File Access Date/Time           : 2024:05:17 14:53:54-04:00
File Inode Change Date/Time     : 2024:05:17 14:53:54-04:00
File Permissions                : -rw-r--r--
File Type                       : JPEG
File Type Extension             : jpg
MIME Type                       : image/jpeg
JFIF Version                    : 1.01
Resolution Unit                 : cm
X Resolution                    : 28
Y Resolution                    : 28
Image Width                     : 1280
Image Height                    : 1807
Encoding Process                : Baseline DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 1
Image Size                      : 1280x1807
Megapixels                      : 2.3

Which makes me wonder what this setting is for since it doesn't seem to be correcting the mislabeled extension. Tried both with it enabled and disabled and did not see any difference between the two. Screenshot_20240517_162844_Highlighted

This setting would make me THINK that it is supposed to correct files with mislabeled extensions, but maybe I have misunderstood it.

Steps to reproduce

Will note the ID provided here as an example is NOT SFW so far I have not encountered this on any SFW content that Grabber supports that I am aware of (Which is not to say that there aren't any I just don't know of them.) If Grabber supported Mangadex though I could probably provide 100s of examples that are SFW because I have had plenty of chapters which were uploaded there suffer from the same issue of mislabeled extensions. It seems to be a common issue with some translators/scanlation groups for whatever reason. If someone could point me to where I could find a SFW example supported by Grabber I would love to switch out what is listed here.

  1. Download from an affected gallery (example: 505041) with metadata tagging via Exiftool enabled
  2. Check the log and you should see something like this for every file that is affected.
    [15:16:57.350][Error] [Exiftool] Error: Not a valid PNG (looks more like a JPEG) - /YourFilePathHere/505041-1-35fccb648845c4319e9bd876f2255269.png

Expected behavior

The files to be saved with the correct extension and tagged properly.

Additional Context

Also messed about with the image conversion settings to see whether they would get converted/tagged successfully.

ffmpeg PNG -> JPG: Works and tagged correctly PNG -> JXL: Works, but Exiftool has a minor error so not tagged PNG -> WebP: Errors out most likely due to the file not matching it's extension if I am understanding this error message correctly. [Error] [FFmpeg] [webp @ 0x5612339ca080] Only WebP is supported [out#0/webp @ 0x5612339c9b80] Could not write header (incorrect codec parameters ?): Invalid argument Error opening output file /InsertFilePathHere. Error opening output files: Invalid argument PNG -> PNG: Errors out due to existing file so can't just run a conversion to make sure all PNGs are actually PNG. ImageMagick PNG -> JPG: Works and tagged correctly PNG -> JXL: Works, but Exiftool has a minor error so not tagged PNG -> WebP: Works, but no tags and no error in the log for some reason. PNG -> PNG: Errors out due to existing file same as ffmpeg

The JXL error seems to be more just an in general issue with the version of Exiftool I have. The WebP one I am not sure of because I haven't sat down to see about tagging them directly with Exiftool and would need to do more testing.

Oh and some programs will of course have issues opening the file if they don't check the content of the file and simply go by extension. Most modern programs won't have any issues, but some older or niche programs may have issues.

Have probably spent longer testing and writing all of this than I should have.

System information

Bionus commented 1 month ago

This setting would make me THINK that it is supposed to correct files with mislabeled extensions, but maybe I have misunderstood it.

No, you're exactly right. It's supposed to read the first few bytes of the file, check for known image headers, and if there's a match use that instead of the other extension. Not sure why it's not working for you, I'll investigate.

negiworx commented 1 week ago

I am also having this same issue.

Ubuntu 22.04.4 LTS Grabber 7.12.2