drewnoakes / metadata-extractor

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Apache License 2.0
2.55k stars 479 forks source link

OutOfMemoryError on corrupted png file #534

Closed Draczech closed 3 years ago

Draczech commented 3 years ago

Hi, I'm getting OOM error while loading a png file. It is a corrupted file, but this kind of error should be prevented from firing anyway.

File

File location: https://better-essay-service.com/favicon-96x96.png Info: Supposedly 96x96px png file, actual size 8556 bytes, content can be checked via GIMP. Thumbnail: favicon_better_essay_service_com

Stacktrace

    java.lang.OutOfMemoryError: Java heap space
        at com.drew.lang.StreamReader.getBytes(StreamReader.java:71)
        at com.drew.imaging.png.PngChunkReader.extract(PngChunkReader.java:96)
        at com.drew.imaging.png.PngMetadataReader.readMetadata(PngMetadataReader.java:102)

Debug info

Debugging the issue I found out that extracting chunk data works until reader position 8070 in the file. Then calling SequentialReader.getInt32() method results in following:

// Motorola - MSB first (big endian)
return (101 << 24 & 0xFF000000) |
(58 << 16 & 0xFF0000) |
(99 << 8  & 0xFF00) |
(114       & 0xFF);

Valid int value 1698325362 is returned. Then in StreamReader.getBytes(int count) allocation of new byte[1698325362] fails hard.

I guess some sanity check on those values returned should be in place? E.g. when reader tries to allocate more bytes than the actual length of the file is?

Possibly related to https://github.com/drewnoakes/metadata-extractor/issues/273 ?

drewnoakes commented 3 years ago

E.g. when reader tries to allocate more bytes than the actual length of the file is?

This is a good strategy. It's not always straightforward to know how large the file is, however. For example, we may be streaming data with unknown length over a network.

drewnoakes commented 3 years ago

@Draczech are you happy and able to publicly donate that sample image to this library for the purposes of regression testing?

Draczech commented 3 years ago

@Draczech are you happy and able to publicly donate that sample image to this library for the purposes of regression testing?

I would be very happy to donate the sample image for testing purposes. Unfortunately it is third party image I just randomly encountered using my company's image processing. I suppose nobody minds donating broken file which has no real usage, but I will leave that decision up to you.

Do you want me to implement the changes proposed in https://github.com/drewnoakes/metadata-extractor/pull/535 ? Or should that PR be closed?

drewnoakes commented 3 years ago

535 has been merged (thanks again).

Draczech commented 3 years ago

I can see this issue lead to a few suggestions for future improvement of the metadata-extractor, so I hope it will lead to better results overall.

Just one question remaining: Do you plan to release a new version of the library in upcoming days/weeks/months? Let me know, please. Thank you!

drewnoakes commented 3 years ago

@Draczech released in https://github.com/drewnoakes/metadata-extractor/releases/tag/2.16.0