DLR-SL / CPACS

CPACS - Common Parametric Aircraft Configuration Schema
http://dlr-sl.github.io/CPACS/
Apache License 2.0
78 stars 38 forks source link

Explicit CPACS namespace #809

Open MarAlder opened 1 year ago

MarAlder commented 1 year ago

Refers to #806: This is a highly experimental issue, as I'm aware of the implication with XPath and thus TiXI/TiGL. Nevertheless I would like to open the discussion.

I want to identify pros' the cons' of using a namespace for CPACS, implemented like below:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
    xmlns="http://www.cpacs.de/cpacs_schema_v4.0.0"
    targetNamespace="http://www.cpacs.de/cpacs_schema_v4.0.0" 
    elementFormDefault="qualified">

Pro:

Contra:

ArthurZamfir commented 1 year ago

I did write a converter from XML to RDF and it would make things more explicit and well-formed if we used namespaces for CPACS. Otherwise a URI resource in RDF would not be a well formed URI without guessing/inventing a namespace. Within the semantic web world it would have many advantages when it comes to linking results across models as well as different versions of CPACS. Having the version number inside the namespace is maybe not the best idea, because for every new version it would mean that all the XML types (tags) are semantically considered as new things since they have a new URI. Maybe there is a different way of versioning in XML? 🤔

MarAlder commented 1 year ago

Hmm, schema versioning is a difficult topic: see this stackoverflow post

ArthurZamfir commented 1 year ago

I remember the discussions about this topic and that it's not an easy problem with a simple solution. For the use-case of transforming the data to an RDF model it would be just useful to have proper URIs to uniquely identify the rdf:type (XML-tags) of elements. If the namespace includes a version number that changes with every minor release, it would mean that these types are not necessarily the same anymore for every release. In order to integrate and explore such data across different CPACS version, one would have to define OWL-inference statements on these types, such as owl:equivalentClass or alternatively run some script that does the data transformation into a common namespace. From my perspective that is fine as well and it can definitely be considered correct to treat all tags in differing versions as potentially different by default. Including a version number in the namespace can be semantically correct and works with RDF but might not be convenient from the perspective of processing data with RDF. Since that perspective is not the only one and there are definitely other domains/perspectives that might benefit from such a version number within the namespace, I would support the proposed versioning approach.

ArthurZamfir commented 5 months ago

I am currently continuing work on the XML/XSD import to the RDF/SHACL world and was dealing with similar issues of namespaces again. After playing around with a CPACS XSD/XML example and changing the attributes on the <xsd:schema> element as well as the root <cpacs> element, I now believe it is actually not necessary to modify the XPaths and still be able to declare a proper namespace.

Suggestion

I have made the following changes to the XSD and XML files:

XSD

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
    xmlns="http://www.cpacs.de/schema/"
    targetNamespace="http://www.cpacs.de/schema/" 
    elementFormDefault="qualified">

XML

<cpacs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
       xmlns="http://www.cpacs.de/schema/"
       xsi:schemaLocation="http://www.cpacs.de/schema/ https://www.cpacs.de/schema/v3_5_0/cpacs_schema.xsd">

I have used the features of my IDE (IntelliJ IDEA) to verify that the current simple XPaths like /cpacs/header/name/text() still evaluate to the correct value. Additionally, I have used this online tool to validate that the IDE is not doing any magic in the background. The schema validation also works with these changes. I intentionally set the targetNamespace to something that doesn't change across version numbers since the meaning of the declared types remains, just their structure might change across versions. That is also why the xsi:schemaLocation asks for a mapping of namespace to schema-location. This approach is also the same taken by most programming languages, e.g. when we write import numpy we usually don't specify the version but it depends on which version we have available in our environment. Further, the OWL Specification recommends the same importing mechanism. I would recommend a namespace declaration that ends with a / or a # since that would make it aligned to the RDF-world where everything is a resource defined in its namespace, e.g. the cpacsType would be https://www.cpacs.de/schema/cpacsType or https://www.cpacs.de/schema#cpacsType.

@MarAlder : Could you please validate if my suggestions and assumptions work? If it does work for you as well, it would mean we can more explicitly define the CPACS schema with its own namespace and make it easier to translate to other representations without the need to adapt any tools to these changes. Since this would then not be a breaking change, we could introduce it already in the next minor version of CPACS.

MarAlder commented 5 months ago

Hmm, first, thanks for intensive testing!

Back then I actually tried the same approach. However, I only tested with TiXI, and the only way I got it working was like this:

tixi_h.openDocument("test.xml")
tixi_h.registerNamespace("http://www.cpacs.de/schema/", "cp")
name = tixi_h.getTextElement("/cp:cpacs/cp:header/cp:name")
print(name)
tixi_h.close()

So, is it about the TiXI implementation? If, magically, Binder works you might test in the browser: Binder > Notebooks > Developer > Namespaces > test_namespaces.jpynb

I can confirm that it is working in the online tool. Also VSCode + XML extension (by Red Hat) manages to connect the nodes to the XSD without using prefix. So what do I miss when using TiXI? @joergbrech and @rainman110, do you remember the discussion we had back then that such a change would require major changes behind the scenes of tools such as TiXI and TiGL?


Example xml used for the python-test above:

<cpacs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
       xmlns="http://www.cpacs.de/schema/"
       xsi:schemaLocation="http://www.cpacs.de/schema/ cpacs_schema.xsd">

    <header>
        <name>Test</name>
        <version>1.0</version>
        <versionInfos>
            <versionInfo version="1.0">
                <creator>M.A.</creator>
                <cpacsVersion>3.5</cpacsVersion>
                <description>Testing the usage of a CPACS namespace</description>
                <timestamp>2024-03-11T20:17:00</timestamp>
            </versionInfo>
        </versionInfos>
    </header>
</cpacs>
joergbrech commented 5 months ago

I don't remember the discussion. TiXI uses libxml2 under the hood and if I understand correctly, default namespaces are an XPath 2.0 feature which is not supported by libxml2: https://gitlab.gnome.org/GNOME/libxml2/-/issues/585.

ArthurZamfir commented 5 months ago

@joergbrech I also found the same issue that you have linked to this morning while investigating a bit more. I wonder what the reason is that newer versions of XPath are not supported by libxml2. XPath 2.0 is from 2010 and XPath 3.0 from 2014, so already quite mature and used in most of the libraries, seemingly. Maybe there is a way to use a different XPath evaluation library instead while still using libxml2?

joergbrech commented 5 months ago

Maybe there is a way to use a different XPath evaluation library instead while still using libxml2?

Maybe, I haven't dug deeper yet.

But I have the feeling that supporting default namespaces boils down to either

I hope there is a simpler solution...