drewnoakes / metadata-extractor

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Apache License 2.0
2.57k stars 482 forks source link

Support extracting thumbnail data #149

Open guigri opened 8 years ago

guigri commented 8 years ago

Dear Drew,

it is possible to extract thumbnail images from a Canon EOS 450D raw image file (CR2) using an old version 2.6.2 of metadata-extractor:

Metadata localMetadata = ImageMetadataReader.readMetadata(new File("<path to CR2 file>"));
ExifThumbnailDirectory myExifThumbnailDirectory;
myExifThumbnailDirectory = localMetadata.getDirectory(ExifThumbnailDirectory.class);
myExifThumbnailDirectory.hasThumbnailData(); // ---> returns true

Apparently, this does not work any more with version 2.8.1 of metadata-extractor.

Metadata localMetadata = ImageMetadataReader.readMetadata(new File("<path to CR2 file>"));
ExifThumbnailDirectory myExifThumbnailDirectory;
myExifThumbnailDirectory = localMetadata.getFirstDirectoryOfType(ExifThumbnailDirectory.class);
myExifThumbnailDirectory.hasThumbnailData(); // ---> returns false !!!

myExifThumbnailDirectory.getString(ExifThumbnailDirectory.TAG_THUMBNAIL_COMPRESSION)); does work on the same file (returns 6), for example.

Do I make any mistake?

I can provide an example image file if necessary.

Thank you.

Best wishes, Guido

kwhopper commented 8 years ago

CR2 is read using TiffMetadataReader, which ends up using ExifTiffHandler. For some reason, the second argument to the ExifTiffHandler constructor (storeThumbnailBytes) is explictly set to false. ExifReader also uses ExifTiffHandler, but has a property (StoreThumbnailBytes) that is passed through.

Drew will have to comment. Perhaps TiffMetadataReader does this on purpose, or was accidentally not completed. If it was accidental, it is easily fixed.

drewnoakes commented 8 years ago

@kwhopper is correct.

If you know you're decoding a TIFF file (i.e. CR2) then you could just use the TiffMetadataReader class directly and pass 'true' for storeThumbnailBytes.

I'm happy for the default to be changed. I expect most users don't use thumbnail images so it seems wasteful to allocate them in the default case. Perhaps a better idea is to allow this setting to be passed in (optionally) to ImageMetadataReader and propagated to sub-readers that support thumbnail extraction.

On 17 March 2016 at 21:52, kwhopper notifications@github.com wrote:

CR2 is read using TiffMetadataReader, which ends up using ExifTiffHandler. For some reason, the second argument to the ExifTiffHandler constructor (storeThumbnailBytes) is explictly set to false. ExifReader also uses ExifTiffHandler, but has a property (StoreThumbnailBytes) that is passed through.

Drew will have to comment. Perhaps TiffMetadataReader does this on purpose, or was accidentally not completed. If it was accidental, it is easily fixed.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/drewnoakes/metadata-extractor/issues/149#issuecomment-198100113

drewnoakes commented 8 years ago

Given your sample code, would it be suitable to have code such as this?

File file = new File("<path to CR2 file>");
Metadata meta = ImageMetadataReader.readMetadata(file);
ExifThumbnailDirectory dir = meta.getDirectory(ExifThumbnailDirectory.class);
if (dir.hasThumbnail()) {
    byte[] thumbnailData = dir.getThumbnail(file); // pass 'file' again (or a seekable stream)
    // use bytes
}

If you're working with files, this should be fine. However if you were consuming data from a non-seekable stream (maybe over a network) then it wouldn't be suitable. However I expect it'd cover the majority of use cases and be simple to implement.

guigri commented 8 years ago

...thanks guys for your comments.

@Drew: I am working with files and your code is suitable. There is no getDirectory(...) method in Metadata today, so I guess your code is a suggestion for changing the library?

I have no idea about how many people do need to access thumbnails and whether it is worth to introduce a specific handling. However, a clean approach to handle this is probably kwhoppers suggestion to pass some kind of flag "considerThumbnailData" through the methods. In this case classes ImageMetaDataReader, Metadata and TiffMetadataReader might have alternative methods readMetadata(...) with an additional boolean parameter, for example.

sschloen commented 8 years ago

This thumbnail extraction from CR2 files feature is one of the main reasons we are using this library and we are eager to see this fixed. Your suggestion above which you state you expect would cover the majority of cases and be simple to implement would certainly work for us. We've had to revert to 2.6.2. in the meantime. Thanks!

haumacher commented 5 years ago

The second thing I tried with this library is getting the thumbnail data and cannot find a solution. It is perfectly OK not to store the thumbnail data in the metadata, offset and lenth are great. Unfortunately, the following does not work - as I understand, because the offset is not (or not always) the offest from the beginning of the file:

File file = ...;
Metadata metadata = ImageMetadataReader.readMetadata(file);

ExifThumbnailDirectory tn = metadata.getFirstDirectoryOfType(ExifThumbnailDirectory.class);
long offset = tn.getLong(ExifThumbnailDirectory.TAG_THUMBNAIL_OFFSET);
int length = tn.getInt(ExifThumbnailDirectory.TAG_THUMBNAIL_LENGTH);
try (RandomAccessFile handle = new RandomAccessFile(file, "r")) {
   handle.seek(offset);
   byte[] buffer = new byte[length];
   handle.readFully(buffer);

   try (FileOutputStream out = new FileOutputStream(new File("tn.jpg"))) {
      out.write(buffer);
   }
}

How do I compute the correct offset?

haumacher commented 4 years ago

There is a workaround that dynamically patches the library to store thumbnail data in the ExifThumbnailDirectory as explained in https://github.com/drewnoakes/metadata-extractor/issues/276#issuecomment-677767368.