cessda / cessda.metadata.profiles

Contains DDI Profiles that are used by the CESSDA Metadata Validator Tool
Other
0 stars 0 forks source link

Profiles are invalid when using XPath 1.0 implementations #177

Closed matthew-morris-cessda closed 4 weeks ago

matthew-morris-cessda commented 5 months ago

Many XPaths in the DDI 2.5 profiles don't include prefixes when accessing elements in the ddi:codebook:2_5 namespace.

<pr:XMLPrefixMap>
    <pr:XMLPrefix/>
    <pr:XMLNamespace>ddi:codebook:2_5</pr:XMLNamespace>
</pr:XMLPrefixMap>

Because XPath 1.0 defines a reference without a prefix to always resolve to elements without a namespace, an XPath such as /codeBook/@xsi:schemaLocation will not resolve correctly using a compliant XPath implementation. The correct way to define such an XPath is /ddi:codeBook/@xsi:schemaLocation using the following prefix map:

<pr:XMLPrefixMap>
    <pr:XMLPrefix>ddi</pr:XMLPrefix>
    <pr:XMLNamespace>ddi:codebook:2_5</pr:XMLNamespace>
</pr:XMLPrefixMap>

This means all of the profiles are broken when using XPath 1.0 implementations.

matthew-morris-cessda commented 5 months ago

The XPath specification is defined using <pr:XPathVersion> and is set to 1.0 for all profiles. Setting the severity to critical.

katja-moilanen commented 5 months ago

cdc25_profile_short_test_NAMESPACE.zip I'm not sure if I understood the problem correctly, @matthew-morris-cessda . I did a short test profile document (as an attachment) where namespace "ddi" were added in XPaths. I tested it with CMV with these CESSDA valid (no constraint violations when validating with CDC DDI 2.5 profile 1.0.4) metadata records: https://services.fsd.tuni.fi/catalogue/FSD1000/DDI/FSD1000_fin.xml https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_ddi25&identifier=5fa8e0900b092f76550c887ae855e73f76afdc95da6ecfe302a45fb2cc3def39 https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_ddi25&identifier=1286b4afac0f59c961cacd0e48bf0e7139c93fd0ddfb1d924dd1018d01028938 and the result is that adding namespace into XPaths of the profile made these records invalid.

The example document given in issue https://github.com/cessda/cessda.cmv.core/issues/111 is valid when using the test profile document attached.

If the profiles will be implemented with adding namespace prefix as it is in the test profile document, it will quite likely mean that every SP using codebook 2.5 must fix all of their metadata.

matthew-morris-cessda commented 5 months ago

The problem is it shouldn't matter if the ddi prefix is part of the element tags are not. Semantically <codeBook xmlns="ddi:codebook:2_5"/> is the same as <ddi:codeBook xmlns:ddi="ddi:codebook:2_5"/> when XML namespaces are taken into account. CMV at the moment treats this incorrectly.

Keep in mind that prefixes in the XPath are distinct from prefixes in the tested documents. For example, the following XPath //ns1:codeBook/ where the prefix ns1 is bound to ddi:codebook:2_5 will match the element defined as <ns35:codeBook xmlns:ns35="ddi:codebook:2_5"/>.

The XPaths in the profiles are incorrect as according to the XPath 1.0 specifications any element reference without a namespace prefix refers to elements with no namespace declared, so //codeBook/ will never match <codeBook xmlns="ddi:codebook:2_5"/> as the codeBook element is in the ddi:codebook:2_5 namespace. It will match <codeBook xmlns=""/> as this element has no namespace declared.

I can set up a meeting to discuss the problem further if there are further questions.

matthew-morris-cessda commented 5 months ago

To make this clear, the only reason these profiles work at all is because CMV's handling of namespaces is broken.

katja-moilanen commented 5 months ago

I was not aware that the CMV will be developed this year. Then everything is OK.