drewnoakes / metadata-extractor-dotnet

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Other
934 stars 165 forks source link

Keyword extraction for PNG files? #280

Closed Webreaper closed 3 years ago

Webreaper commented 3 years ago

I'm trying to pull the keywords from the attached image. If I run Exiftool, the keywords show up correctly (e.g., 'tree' and 'autumn'). But reading it using the .Net Metadata-Extractor doesn't seem to load the keywords; they're not in the IptcDirectory, nor anywhere that I can find. I've trawled through the entire structure and tags in the debugger and can't see any keyword tags anywhere. What am I missing? Are PNG keywords supported?

tree
drewnoakes commented 3 years ago

Looks like a bug in how we process IPTC data in PNG files. I won't have time to look at it for a while. Would you like to investigate and submit a pull request if you find a fix?

Webreaper commented 3 years ago

Might be able to, although wouldn't have the first clue where to start....

paperboyo commented 3 years ago

Hey. I’m only a user, so cannot help much! These keywords (to an untrained eye):

  1. Original image (exiftool -v2):

    PNG zTXt (113 bytes):
    + [Photoshop directory, 40 bytes]
    | IPTCData (SubDirectory) -->
    | - Tag 0x0404 (28 bytes)
    | + [IPTC directory, 28 bytes]
    | | -- IPTCApplication record --
    | | Keywords = Autumn, tree, le
    | | - Tag 0x0019, IPTCApplication record (16 bytes, string[0,64])
  2. Added two keywords using exiftool -use MWG -MWG:Keywords+="":

    PNG zTXt (132 bytes):
    + [Photoshop directory, 64 bytes]
    | IPTCData (SubDirectory) -->
    | - Tag 0x0404 (52 bytes)
    | + [IPTC directory, 52 bytes]
    | | -- IPTCApplication record --
    | | Keywords = Autumn, tree, le
    | | - Tag 0x0019, IPTCApplication record (16 bytes, string[0,64])
    | | ApplicationRecordVersion = 4
    | | - Tag 0x0000, IPTCApplication record (2 bytes, int16u)
    | | Keywords = atari
    | | - Tag 0x0019, IPTCApplication record (5 bytes, string[0,64])
    | | Keywords = commodore
    | | - Tag 0x0019, IPTCApplication record (9 bytes, string[0,64])
  3. xmp.dc:subjects array after adding new ones in pt. 2:

    PNG iTXt (905 bytes):
    + [XMP directory, 883 bytes]
    | XMPToolkit = Image::ExifTool 12.08
    | Subject = atari
    | - Tag 'x:xmpmeta/rdf:RDF/rdf:Description/dc:subject/rdf:Bag/rdf:li 10'
    | Subject = commodore
    | - Tag 'x:xmpmeta/rdf:RDF/rdf:Description/dc:subject/rdf:Bag/rdf:li 11'
drewnoakes commented 3 years ago

Thanks @paperboyo that's really helpful. This does seem like quite an exotic way of storing keywords in a PNG file, but clearly the data is in there and readable, so I'd like for MetadataExtractor to be able to pull it out.

Might be able to, although wouldn't have the first clue where to start....

I'm happy to help where I can. At a high level:

Currently the tree of directories for your file looks like this:

- PNG-IHDR
- PNG-iCCP
    - ICC Profile
- Exif IFD0
    - Exif SubIFD
- PNG-pHYs
- XMP
- PNG-zTXt
- File Type
- File

I would expect to see in there something like:

- PNG-zTXt
    - IPTC

In other words, the PNG-zTXt directory produces a child IPTC directory.

There is no error message so either we don't attempt this at all, or it's attempting but deciding it cannot proceed for some reason.

drewnoakes commented 3 years ago

You can stick a breakpoint here:

https://github.com/drewnoakes/metadata-extractor-dotnet/blob/427ab46a9293013c418ad32ad4a9400f2edfa060/MetadataExtractor/Formats/Png/PngMetadataReader.cs#L409-L419

If that doesn't get hit for some reason, try here:

https://github.com/drewnoakes/metadata-extractor-dotnet/blob/427ab46a9293013c418ad32ad4a9400f2edfa060/MetadataExtractor/Formats/Png/PngMetadataReader.cs#L224-L255

Webreaper commented 3 years ago

Thanks!!