buda-base / asset-manager

Asset Manager and audit tool
The Unlicense
0 stars 0 forks source link

error / warning on images with an embedded thumbnail #125

Closed eroux closed 2 years ago

eroux commented 3 years ago

We do have some jpg images that embed thumbnails, for instance https://iiif.bdrc.io/bdr:I1NLM2232_001::I1NLM2232_0010001.jpg/full/full/0/default.jpg (this is a byte copy of the s3 image, no treatment by the IIIF server). You can extract the thumbnail from it using (under Linux at least):

wget https://iiif.bdrc.io/bdr:I1NLM2232_001::I1NLM2232_0010001.jpg/full/full/0/default.jpg -O I1NLM2232_0010001.jpg 
exiftool -b -ThumbnailImage I1NLM2232_0010001.jpg > thumbnail.jpg

Here it is attached for reference:

thumbnail

It's about 13kB in size, this means that 13k at the beginning of the image are taken by the thumbnail, but are not useful for our purpose... (unless I'm missing something?). I think we could make at least a warning in the asset manager if there are thumbnails in JPGs.

jimk-bdrc commented 3 years ago

That would actually be really helpful to you to have this information extracted and preserved somehow. I've been resisting having audit tool actually do anything, but there's certainly a case for starting to build a processing suite that extracts thumbnails and ICC profile for each object, so that IIIF server doesn't have to.

The problem with errors/warnings is that, in general, nobody sees them, and someone has to then post process (do what your shell script does).

If we had a workflow that read errors & warnings as part of a toolchain, and decided what to do with them, that workflow could pick up the notifications and extract + persist the ICC profile, thumbnail, or whatever else IIIF needs.

jimk-bdrc commented 3 years ago

@TBRC-Travis Another question is why NLM processing is doing this when (possibly) nobody else is. This gets back to the discussion Karma and I were having, and will continue to have, next week.

eroux commented 3 years ago

Oh actually just ignore the embedded thumbnails so ideally it wouldn't be there at all... if at some point I create thumbnails that will be a separate objects and it would be in separate files, not embedded. So my request would be more to issue a warning when there's an embedded thumbnail and ask the user to remove it

TBRC-Travis commented 3 years ago

@eroux agreed. in the case of NLM the inclusion of embedded thumbnails was not intentional. it's likely an artifact of using Adobe Lightroom for processing at NLM which is likely adding some of these extra bits under the hood. I can adjust the NLM process to stop generating the thumbnails.

eroux commented 3 years ago

ah great, thanks! Note that this could possibly be part of a processing script:

$ ls -al I1NLM2232_0010001.jpg 
-rw-r--r-- 1 eroux eroux 473078 juin   3 16:53 'I1NLM2232_0010001.jpg'
$ exiftool -ifd1:all= I1NLM2232_0010001.jpg
    1 image files updated
$ ls -al I1NLM2232_0010001.jpg 
-rw-r--r-- 1 eroux eroux 459512 juin   3 16:55 'I1NLM2232_0010001.jpg'

note the size reduction, also this doesn't reencode the jpg so it's fine (or at least I don't see why it would reencode it)

jimk-bdrc commented 2 years ago

Why we need a platform

jimk-bdrc commented 2 years ago

@eroux Would it help if this info were encoded into dimensions.json? (Won't help the past, but might help the future) v_m_b can extract the 0x201 fields from a num,ber of dictionaries (see https://www.exiftool.org/TagNames/EXIF.html 0x201 0x202) and derive where the image thumbnail is (not create the node in dimensions.json if there is none.)

It might just help to have the warning in a place where IIIFPRES can use it, as well as where the creator sees it.

eroux commented 2 years ago

I don't think it will be useful no, I think my intention was to make sure that audittool complains if there's an embedded thumbnail

jimk-bdrc commented 2 years ago

From metadata-extractor-issue-262

  Metadata metadata = ImageMetadataReader.readMetadata(inStream);

            ExifSubIFDDirectory directory = metadata.getFirstDirectoryOfType(ExifSubIFDDirectory.class);

            int offset = directory.getInt(0x0201);
            int length = directory.getInt(0x0202);

            logger.info("Embedded jpeg offset: " + offset);
            logger.info("Embedded jpeg length: " + length);

            inStream.reset();
            inStream.skip(offset);

            byte[] jpegData = new byte[length];
            inStream.read(jpegData, 0, length);

            return jpegData;
jimk-bdrc commented 2 years ago

Closed in PR #161