drewnoakes / metadata-extractor-dotnet

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Other
922 stars 164 forks source link

Encoding Error extracting XMP data from PNG file #356

Closed Webreaper closed 7 months ago

Webreaper commented 7 months ago

Hi Drew,

I've got a user who's having an issue when Damselfly attempts to extract metadata from a PNG file.

I've attached the image here:

CaptionTest

When I run the image through exiftool you can see the caption "This is the caption" in the 'Description' field.

image

However, when load the metatdata using metadata-extractor, the XmlDirectory doesn't have any tags, and just has an error saying Error processing XMP data: Unsupported Encoding. Any ideas what's wrong here?

I'm on .Net 8, running on Linux, and using MatadataExtractor v2.8.1. Let me know if you need any other info.

drewnoakes commented 7 months ago

The debugger shows the embedded XMP is invalid. The inner exception is:

System.Xml.XmlException: 'Data at the root level is invalid. Line 177, position 20.'

Looking at the decoded bytes we see:

image

I assume a tool re-wrote the XMP, reducing the length of the segment, without zeroing out the overflow or shortening the segment.

Exiftool must have some logic for this case. We can look to do the same. I haven't thought about this very much, but it seems that scanning for the <?xpacket end="r"?> could help here. I don't know the significance of the r here though.

Are you willing and able to donate this image to the regression test suite so that we can track this issue there?

drewnoakes commented 7 months ago

Any scanning here should ideally be performed:

Once found, the byte array length can be adjusted when handing off to XmpCore:

https://github.com/drewnoakes/metadata-extractor-dotnet/blob/dcf8f31b467a45c35e5ecbb5ec27516c033d3ca6/MetadataExtractor/Formats/Xmp/XmpReader.cs#L91

grainsoflight commented 7 months ago

Just going to add a bit of information here as this is my image I provided to webreaper for debugging. If it helps you at all, the XMP was written with lightroom classic version 13.1. You may send this image to whomever you need to to work through the issue.

drewnoakes commented 7 months ago

@grainsoflight we maintain a repository of test images, and I think your case is interesting enough that I'd like to add it there: https://github.com/drewnoakes/metadata-extractor-images

It's a public repository, so please ensure you're happy with it being preserved in that way (though attaching the image to the post here means it's essentially already public).

grainsoflight commented 7 months ago

Thats fine with me

drewnoakes commented 7 months ago

This problem exists in the Java library as well, though with a slightly different error message.

JAVA   [ERROR: XMP] Error processing XMP data: XML parsing failure
DOTNET [ERROR: XMP] Error processing XMP data: Unsupported Encoding
drewnoakes commented 7 months ago
  • in reverse, to reduce overhead

Reverse order won't work. It's likely that the marker still exists at the end (see above screenshot for an example).

Webreaper commented 7 months ago

How can I pick this up to test? Or will you be making a new release?

drewnoakes commented 7 months ago

I hope to get a build out soon. If you want to test before that, you can build your own version.

Webreaper commented 7 months ago

No prob. I can wait. Just wasn't sure if you had a dev pipeline / repo, similar to how Matt does it with Skiasharp.

drewnoakes commented 7 months ago

I'd love to set up automatic releases from CI. We have it in NetMQ too and it's very handy. One of these days :)

Webreaper commented 7 months ago

Yeah, took me ages to set it up with Damselfly - github actions are a bit of a PITA. But totally worth it.