drewnoakes / metadata-extractor

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Apache License 2.0
2.55k stars 479 forks source link

Deep recursion / stackoverflow with tiff image #559

Open eriken opened 2 years ago

eriken commented 2 years ago

Hello.

Tiff image can be downloaded here: https://github.com/cgohlke/imagecodecs/blob/master/tests/tiff/gray.movie.u2.tif

Then try to extract metadata by using this code:

InputStream inputStream = new ByteArrayInputStream(imageBlob);

Metadata metadata = ImageMetadataReader.readMetadata(inputStream);

Directory mimeTypeDir = metadata.getFirstDirectoryOfType(FileTypeDirectory.class);

String mimeType = mimeTypeDir.getString(3);
String extension = mimeTypeDir.getString(4);

This should throw a java.lang.StackOverflowError

I also tried using a a tool like ExifTool and it threw a similar warning: Deep recursion on subroutine "Image::ExifTool::ProcessDirectory" Deep recursion on subroutine "Image::ExifTool::Exif::ProcessExif"

Let me know if you need more details.

drewnoakes commented 2 years ago

Sounds like maybe there's some kind of circular reference. We have seen this in Exif before (for IFD offsets) and addressed it by remembering the set of visited offsets to prevent runaway execution. This may be similar, or different :)

cgohlke commented 2 years ago

Not sure this is helpful: the file gray.movie.u2.tif contains more than 2^16 IFDs. Such files are not uncommon in bio-imaging. Some libraries (e.g. libtiff) can not access all IFDs because they use an uint16 counter.

kaiwinter commented 2 years ago

These are the tagIds which get processed:

258
259
262
273
277
278
279
282
283
296
256
257
258 <- recursion
259
...

Is adding a variable processedTagIds the way to go (following the naming of processedIfdOffsets)?

Hm, we could also check the tagNumber, this does not need to be calculated first. What makes me a little puzzled is that the tagOffset is always different for the same tagNumer/tagId. Shouldn't it stay the same? If it would be the same the current processedIfdOffsets-mechanism should prevent the recursion already. I refer to this code: https://github.com/drewnoakes/metadata-extractor/blob/master/Source/com/drew/imaging/tiff/TiffReader.java#L148

drewnoakes commented 2 years ago

@kaiwinter are those tags all in the same IFD?

tagOffset will change based on tagNumber. The offset then determines the tagId. It would be possible for an IFD to repeat the same tagId multiple times with different tagNumbers, but I'm not sure that's valid. We store these in a map, so any duplicate tags would overwrite previous ones.

Are you also experiencing this issue in other files?

kaiwinter commented 2 years ago

Those tags are all in different IFDs. Please ignore my previous comment as I wasn't aware of the exact TIFF structure but now I read some of the specification. I used a different tool to examine the structure of that tiff. It contains 65568 IFDs which is just too much for the default stack size. As the processing works in a recursive way this is just some kind of limitation.

@eriken You can work around this StackOverflowError by setting a bigger stack size for the VM e.g. -Xss1g. With that size I'm able to parse the TIFF using metadata-extractor.