drewnoakes / metadata-extractor

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Apache License 2.0
2.57k stars 482 forks source link

Make sure the image with and height are positive numbers #591

Closed mveunen closed 9 months ago

mveunen commented 1 year ago

We found out that BMP files can have a header that specifies the image width and/or image height as a negative number.

drewnoakes commented 1 year ago

Can you provide an image that reproduces this please? I feel like using ABS should not be necessary, and I'd like to understand why a negative value is returned.

mveunen commented 1 year ago

negative_height_example.zip I have an example image attached. This one has a negative height (-128) and a positive width (128).

drewnoakes commented 1 year ago

@mveunen what's the origin of that image? I'm reluctant to modify the library to hide these kinds of errors unless there's a spec that explains what the significance of a negative value is.

mveunen commented 1 year ago

Hi @drewnoakes,

On https://en.wikipedia.org/wiki/BMP_file_format#DIB_header_(bitmap_information_header) it states:

"Unless BITMAPCOREHEADER is used, uncompressed Windows bitmaps also can be stored from the top to bottom, when the Image Height value is negative.".

The change I have done is a bit more drastic, since we cannot handle negative values in our code. I will update the code in this PR to do ABS in the minimal amount of places.

drewnoakes commented 1 year ago

@mveunen apologies for the long delay in responding here.

Thanks for the extra context. That's helpful. In general we try to keep the data in the Directory as "raw" as possible, and apply any special logic to the Descriptor classes. Some users may rely on knowing that the image is oriented top to bottom, and if we abs the height before storing it, that information is lost. I agree that most users won't want to see "-10 pixels" as a description of the height however. What do you think?

mveunen commented 1 year ago

I can understand the logic to keep the directories as clean/basic as possible. I agree that that is the best way for a library like this.

In this case a negative height/width just is not logic, since a width/height can only be a positive number. What we can do is add a property (orientedTopToBottom) that returns true (or 1) in case the height was negative, otherwise it will be false (or 0). This way the information will still be present and in a very low level way, so everyone can use it for their own interpretation.

drewnoakes commented 1 year ago

We generally try to store the raw value in directories. If we were to ever write the value back to disk, it helps keep things simpler. Similarly for strings we keep the original byte[] around in case the consumer wants to reinterpret the encoding of the underlying value.