Closed eroux closed 2 years ago
That would actually be really helpful to you to have this information extracted and preserved somehow. I've been resisting having audit tool actually do anything, but there's certainly a case for starting to build a processing suite that extracts thumbnails and ICC profile for each object, so that IIIF server doesn't have to.
The problem with errors/warnings is that, in general, nobody sees them, and someone has to then post process (do what your shell script does).
If we had a workflow that read errors & warnings as part of a toolchain, and decided what to do with them, that workflow could pick up the notifications and extract + persist the ICC profile, thumbnail, or whatever else IIIF needs.
@TBRC-Travis Another question is why NLM processing is doing this when (possibly) nobody else is. This gets back to the discussion Karma and I were having, and will continue to have, next week.
Oh actually just ignore the embedded thumbnails so ideally it wouldn't be there at all... if at some point I create thumbnails that will be a separate objects and it would be in separate files, not embedded. So my request would be more to issue a warning when there's an embedded thumbnail and ask the user to remove it
@eroux agreed. in the case of NLM the inclusion of embedded thumbnails was not intentional. it's likely an artifact of using Adobe Lightroom for processing at NLM which is likely adding some of these extra bits under the hood. I can adjust the NLM process to stop generating the thumbnails.
ah great, thanks! Note that this could possibly be part of a processing script:
$ ls -al I1NLM2232_0010001.jpg
-rw-r--r-- 1 eroux eroux 473078 juin 3 16:53 'I1NLM2232_0010001.jpg'
$ exiftool -ifd1:all= I1NLM2232_0010001.jpg
1 image files updated
$ ls -al I1NLM2232_0010001.jpg
-rw-r--r-- 1 eroux eroux 459512 juin 3 16:55 'I1NLM2232_0010001.jpg'
note the size reduction, also this doesn't reencode the jpg so it's fine (or at least I don't see why it would reencode it)
@eroux Would it help if this info were encoded into dimensions.json? (Won't help the past, but might help the future) v_m_b can extract the 0x201 fields from a num,ber of dictionaries (see https://www.exiftool.org/TagNames/EXIF.html 0x201 0x202) and derive where the image thumbnail is (not create the node in dimensions.json if there is none.)
It might just help to have the warning in a place where IIIFPRES can use it, as well as where the creator sees it.
I don't think it will be useful no, I think my intention was to make sure that audittool complains if there's an embedded thumbnail
From metadata-extractor-issue-262
Metadata metadata = ImageMetadataReader.readMetadata(inStream);
ExifSubIFDDirectory directory = metadata.getFirstDirectoryOfType(ExifSubIFDDirectory.class);
int offset = directory.getInt(0x0201);
int length = directory.getInt(0x0202);
logger.info("Embedded jpeg offset: " + offset);
logger.info("Embedded jpeg length: " + length);
inStream.reset();
inStream.skip(offset);
byte[] jpegData = new byte[length];
inStream.read(jpegData, 0, length);
return jpegData;
Closed in PR #161
We do have some jpg images that embed thumbnails, for instance https://iiif.bdrc.io/bdr:I1NLM2232_001::I1NLM2232_0010001.jpg/full/full/0/default.jpg (this is a byte copy of the s3 image, no treatment by the IIIF server). You can extract the thumbnail from it using (under Linux at least):
Here it is attached for reference:
It's about 13kB in size, this means that 13k at the beginning of the image are taken by the thumbnail, but are not useful for our purpose... (unless I'm missing something?). I think we could make at least a warning in the asset manager if there are thumbnails in JPGs.