evanoberholster / imagemeta

Image Metadata (Exif and XMP) extraction for JPEG, HEIC, AVIF, TIFF and Camera Raw in golang. Focus is on providing features and improved performance.
MIT License
116 stars 12 forks source link

`xmp: error no XMP Tag found` even though multiple XMP tags are found #45

Open mholt opened 1 year ago

mholt commented 1 year ago

I'm using the example from the readme on a JPEG file:

m, err := imagemeta.Parse(f)
if err != nil {
    return nil
}
fmt.Println(m.Xmp())

but I get an error of xmp: error no XMP Tag found (i.e. EOF), even though this file has two XMP sections, with these offsets from 0:

$ grep --binary --text --byte-offset --only-matching '<x:xmpmeta' PXL_20230104_032015182.MP.jpg 
8557:<x:xmpmeta
10480:<x:xmpmeta

The first one is for the JPEG, the second one I believe is for an embedded video (motion picture).

When I add some fmt.Printf() lines, I see that the XmpHeader after parsing is showing an offset of 75971, which... is either clearly wrong, or is not offset from 0.

I've cloned the repo and am trying to get a grasp for how the readers work, but there's lots of sectioning and ReadAt() and it's a little hard for me to follow.

Any ideas or troubleshooting tips?

Thanks for this package!

mholt commented 1 year ago

Additionally, I have a .heif file that has an XMP section:

$ grep --binary --text --byte-offset '<x:xmpmeta' 20220423_085935.heif 
974650:debuginfo<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">

But the same code yields an error: xmp: error no XMP Tag found

evanoberholster commented 1 year ago

@mholt, any chance you could share those images with me. That would assist in debugging.

This library is definitely no perfect and written with the goal of performance, so there are likely cases where it needs to be tweaked.

mholt commented 1 year ago

@evanoberholster Absolutely! Sorry, that would have been the obvious helpful thing to do :)

Here's one: https://drive.google.com/file/d/1mmDRBPfxDenvCQz0wezygyGhcWdoNJHk/view?usp=sharing

One thing I have noted in my debugging is that there are XMP fields that are longer than 1.5 KB; notably, I think the MakerNotes on this picture are about 70-80 KB. I've seen some up to 100 KB.

When I raise maximum value and buffer sizes, I am also seeing that Peek() in the readAttributeValue loop somehow stops returning more than about 60 KB at a time, resulting in never finding the closing delimiter. This picture should have makernotes going from 9830 to 78430, but the read buffer will never be large enough to see the closing quotes at 78430.

One thing I also tried in my debugging is to change the signature of Xmp() from returning xmp.XMP to []xmp.XMP, since a file may have multiple XMP documents. (In this file's case, I suspect the second one is for the embedded mp4 file, i.e. motion photo. But my application can know to use only the first or the second depending on what it's doing.)

Thank you so much for the reply! I've been trying a few other libraries yours readme mentions before I found this one, and this one looks the most maintained, so I really appreciate that.

Edit: Oh yeah, and here's a link to the comment with a HEIC image. I haven't had a chance to start debugging it yet.

One thing I appreciate about your lib is that it doesn't rely on <?xpacket as a start token; as I've found that some photo files don't have it. But I think they all use <x:xmpmeta at least.

evanoberholster commented 1 year ago

@mholt. Thanks for sending me that jpg file. It was eye opening. I had not come across a similar file layout as in this image. When I am debugging a file structure I normally use ./exiftool -htmldump image.jpg > image.jpg.html as it gives a good idea of the structure.

In the JPEG that you posted it appears to have APP0 JFIF tags which I hadn't dealt with in my library before. c89196d

Most of the XMP files that I work with come from Lightroom and therefore I have used a smaller buffer (1.5KB). Thank you for the suggestion, I will look for more XMP files to doing testing with.

This go library is definitely undergoing some refactoring and rewriting that you will see under the develop branch. Thank you for the suggestion of returning an []xmp.XMP instead of an xmp.XMP

I will also take a look at the HEIC image in the next few days.

mholt commented 1 year ago

@evanoberholster That's awesome, thank you! I hadn't noticed the develop branch.

I'm kind of new to parsing metadata, so I've learned a ton this week. And wow, exiftool -htmldump is truly useful, thanks for the tip.

Of the three metadata libraries I've tried so far -- and all of them have struggled with the first few images I've tried -- this is the only one that appears to be maintained. So thank you, thank you again for that. It's a breath of fresh air.

Can I possibly sponsor you or donate?

Whatever I can do to test changes or help with development, please let me know!

evanoberholster commented 1 year ago

@mholt Thank you for being willing to support this library. The biggest contribution that you could make would be Issues, PRs, and suggestions. I will be pushing a large rewrite soon and would appreciate suggestions and testing. Thanks for your willingness.

Looking further at your jpg file shows the following.

The first XMP tag has standard headers with proprietary tags from "http://ns.google.com/photos/1.0/camera/"

<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?> 
    <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.5.0"> 
        <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> 
            <rdf:Description 
                rdf:about="" 
                xmlns:GCamera="http://ns.google.com/photos/1.0/camera/" 
                xmlns:GContainer="http://ns.google.com/photos/1.0/container/" 
                xmlns:GContainerItem="http://ns.google.com/photos/1.0/container/item/" 
                xmlns:xmpNote="http://ns.adobe.com/xmp/note/" 
                GCamera:MotionPhoto="1" GCamera:MotionPhotoVersion="1" GCamera:MotionPhotoPresentationTimestampUs="901378" 
                GCamera:shot_log_data="SERSUALvZDVtXnAeLOrjQ5je9sQSDxYeI6WFw+b24jWUOjuJIwNBymft1qWNdGIcTknjXN1nON2pLxaI/RIBi0jHkkTdui6TXLoSxBepOCPXJ096WepulMkoxom/IXi9BlzNvFBkCwWOsDvYpJW3qbfkh89NkYt6XoPrjd/YEu6W77AJyhMbo54w8oSw8YbGPjppnZ9rtG+x8wJzT8iuYOVITZcvzgRQWbqp2rJBI3zPIVm+aTiUsgV0SeIWLmRy5qu++LjnIPmCYmPmO6PbALNfTMctudf7DTYzvlxSQYauHBUte8vX/K/unY655KFFowOYb+sm6PSMh4/1Hq59Xq0UqEQ8xi8gghtuvw8yDjzBdet7ZrxxWMPOwgFrUFFoJj1jkbf0HnKJc3+QgaSWXSpbfeJ8D6/WhgccGkUjpOOiWCXXzdFpmdwkkFQDpYA2usHboQ4nUEFaP7o8UmAmuBOOfhrcvtGxHAosVDswt6GpZdnYG/1dQYCSJ0yhOzH2cbIagxywGyjGSjxAH3H/6AnWiGVTB+Y0TU+XxEkn+ERRdpcG1ktw/NNW8lf50M/2sqoR66Sw7yzujkMkqM3ETzkpMh9NUg/uRZk8aTSWA93DtWKU8PAgF5qao6G38UPzXFLBJ6tlh6EAnth+0NPh8aJWN+jgE/WqzqqMaRGXiB3q+AlYrsDriVS9ZKPPd7xfjtyVWVhAg024bBrK2q1atiuZztPSdwVFPQE=" 
                xmpNote:HasExtendedXMP="F482C5BBD3889E0BC7D8F325D346F3AE"> 
                <GContainer:Directory> 
                    <rdf:Seq> 
                        <rdf:li rdf:parseType="Resource"> 
                            <GContainer:Item GContainerItem:Mime="image/jpeg" GContainerItem:Semantic="Primary" GContainerItem:Length="0" GContainerItem:Padding="0"/> 
                        </rdf:li> 
                        <rdf:li rdf:parseType="Resource"> 
                            <GContainer:Item GContainerItem:Mime="video/mp4" GContainerItem:Semantic="MotionPhoto" GContainerItem:Length="2037777" GContainerItem:Padding="0"/> 
                        </rdf:li> 
                    </rdf:Seq> 
                </GContainer:Directory> 
            </rdf:Description> 
        </rdf:RDF> 
    </x:xmpmeta>
<?xpacket end="w"?>

The second XMP tag contains an XMP extension "F482C5BBD3889E0BC7D8F325D346F3AE" that appears to be mentioned in the first XMP tag. It contains what appears to be binary data from a maker note.

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.5.0"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:GCamera="http://ns.google.com/photos/1.0/camera/" GCamera:hdrp_makernote="SERSUALvZDVtXnAeLOrjLFZmiFHXojfpBfD.....

The third XMP tag contains another XMP extension "F482C5BBD3889E0BC7D8F325D346F3AE" that appears to be the continuing bytes from the second XMP tag.

.....jqYS8mX7cfnw070R4="/> </rdf:RDF> </x:xmpmeta>

Unfortunately I don't have the expertise or the strong interest in supporting proprietary xmp. The goal of this library is to support the basic xmp namespaces as listed here

mholt commented 1 year ago

@evanoberholster Sounds good to me. Thanks for explaining that -- I totally understand not having the desire to support proprietary fields. Can your lib at least extract the bytes of the MakerNotes? It doesn't have to decode them. But if it's part of a standard xmp document then I hope we could at least access them for later.

I will be pushing a large rewrite soon and would appreciate suggestions and testing.

I'm looking forward to trying it! Will it have the ability to detect the file type? Or will I need to call a type-specific function?