drewnoakes / metadata-extractor-dotnet

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Other
936 stars 167 forks source link

Q. about XMP/RDF parsing / error handling in MetadataExtractor/XmpCore #418

Open Numpsy opened 6 months ago

Numpsy commented 6 months ago

Hi, I'm not sure if this is really a question for here, or for XmpCore, but:

I've been doing some tests with using https://github.com/Zeugma440/atldotnet for writing XMP data to .MP4 files, and then reading it back with various libraries. As I've described at https://github.com/Zeugma440/atldotnet/discussions/257#discussioncomment-8943030, I have a case where the Adobe C++ XMP SDK will read the XMP written by ATL, but Metadata Extractor fails with an error from XmpCore.

I'm not presently clear on if the file produced by ATL which has the rdf namespace definition attached to the <x:xmpmeta node rather than the rdf:RDF node is correct, but whilst looking at the difference between the C++ and C# versions of XmpCore I noticed that the error handling in that area seems to be different:

In the C# Case, we have this

private static void Rdf_RDF(XmpMeta xmp, XElement rdfRdfNode, ParseOptions options)
{
    if (!rdfRdfNode.Attributes().Any())
        throw new XmpException("Invalid attributes of rdf:RDF element", XmpErrorCode.BadRdf);

    Rdf_NodeElementList(xmp, xmp.GetRoot(), rdfRdfNode, options);
}

where it throws if the RDF node doesn't have any attributes (this is the where it fails with the ATL test file). However, in the C++ version we have this

// The top level rdf:RDF node. It can only have xmlns attributes, which have already been removed
// during construction of the XML tree.

void RDF_Parser::RDF ( XMP_Node * xmpTree, const XML_Node & xmlNode )
{

    if ( ! xmlNode.attrs.empty() ) {
        XMP_Error error ( kXMPErr_BadRDF, "Invalid attributes of rdf:RDF element" );
        this->errorCallback->NotifyClient ( kXMPErrSev_Recoverable, error );
    }
    this->NodeElementList ( xmpTree, xmlNode, kIsTopLevel );    // ! Attributes are ignored.

}   

which is throwing if the RDF node has any attributes left after the xmlns ones have been removed. I'm unclear why there is a difference between the two - as the XMP Specification does state that the RDF node shouldn't have attributse - it just made me wonder if the C# version should fail when there are none-namespace attributes present, rather than whenever no attributes are present?

Thanks.

ahsanaman92 commented 6 months ago

Hmm. Seems like you are onto something there.

!rdfRdfNode.Attributes().Any() -> if there are no attributes, throw an exception !xmlNode.attrs.empty() -> if there are attributes, throw an exception

kwhopper commented 6 months ago

xmp-core-dotnet is a port of the Java port of XmpCore 6.1.10 that Adobe used to provide. This appears to be a bug in the Java port, and it was carried over to C#. Here is the same code in Java:

/**
 * Each of these parsing methods is responsible for recognizing an RDF
 * syntax production and adding the appropriate structure to the XMP tree.
 * They simply return for success, failures will throw an exception.
 *
 * @param xmp the xmp metadata object that is generated
 * @param rdfRdfNode the top-level xml node
 * @param options ParseOptions to indicate the parse options provided by the client 
 * @throws XMPException thown on parsing errors
 */
static void rdf_RDF(XMPMetaImpl xmp, Node rdfRdfNode, ParseOptions options) throws XMPException
{
    if (rdfRdfNode.hasAttributes())
    {
        rdf_NodeElementList (xmp, xmp.getRoot(), rdfRdfNode, options);
    }
    else
    {
        throw new XMPException("Invalid attributes of rdf:RDF element", BADRDF);
    }
}

Note how something with attributes isn't an error condition.

Feel free to open an issue and one of us might be able to provide a fix. Even so, keep in mind that any fix will only be "compliant" with the 6.1.10 version of XmpCore. I believe the C++ version in the other repo is well beyond that version.

Numpsy commented 6 months ago

Feel free to open an issue and one of us might be able to provide a fix. Even so, keep in mind that any fix will only be "compliant" with the 6.1.10 version of XmpCore. I believe the C++ version in the other repo is well beyond that version.

As far as the versions of the C++ lib go, I have an old version of the source in source control at work that hasn't changed since 2018 which is the same, and the file in the Adobe repo doesn't seem to have changed for 3 years, so hopefully nothing much has changed in this area:

image