fbergmann / libSEDML

SED-ML library based on libSBML
BSD 2-Clause "Simplified" License
9 stars 12 forks source link

Nodes of NewXML do not inherit namespaces from parent document #92

Open jonrkarr opened 3 years ago

jonrkarr commented 3 years ago

Example document:

<sedML xmlns="http://sed-ml.org/sed-ml/level1/version3" level="1" version="3"
  xmlns:sbml="http://www.sbml.org/sbml/level2/version3">
  <listOfModels>
    <model id="model_1" source="model.xml">
      <listOfChanges>
        <addXML target="/sbml:sbml/sbml:model/sbml:listOfParameters" >
          <newXML >
            <sbml:parameter id="V_mT" value="0.7" />
          </newXML >
        </addXML >
      </listOfChanges>
    </model>
  </listOfModels>  
</sedML>

The snippet below illustrates the issue. The SED-ML elements inherit the namespaces of the parent document, but the nodes of NewXML do not. This means that namespaces involved in the content of NewXML needs to be defined for each node in NewXML, which is tedious.

import libsedml
doc = libsedml.readSedMLFromFile('example.sedml')

doc.getListOfModels()[0].getListOfChanges()[0].getNamespaces().getNumNamespaces()
[1]: 2

doc.getListOfModels()[0].getListOfChanges()[0].getNewXML().getNamespaces().getNumNamespaces()
[2]: 0

Similarly, namespaces for NewXML can't be defined at the level of NewXML. For example, the following isn't supported either.

<newXML xmlns:sbml="http://www.sbml.org/sbml/level2/version3">
  <sbml:parameter id="V_mT" value="0.7" />
</newXML >
fbergmann commented 3 years ago

I think this is the desired behavior, as otherwise the snippet would roundtrip as:

<sedML xmlns="http://sed-ml.org/sed-ml/level1/version3" level="1" version="3"
  xmlns:sbml="http://www.sbml.org/sbml/level2/version3">
  <listOfModels>
    <model id="model_1" source="model.xml">
      <listOfChanges>
        <addXML target="/sbml:sbml/sbml:model/sbml:listOfParameters" >
          <newXML >
            <sbml:parameter id="V_mT" value="0.7" xmlns="http://sed-ml.org/sed-ml/level1/version3" xmlns:sbml="http://www.sbml.org/sbml/level2/version3" />
          </newXML >
        </addXML >
      </listOfChanges>
    </model>
  </listOfModels>  
</sedML>

i could add a convenience function that collects namespaces going up the element tree. but that would not be on the newXML element, but on the change element. if that would be helpful.

jonrkarr commented 3 years ago

Summary of the issue

My suggestion refers to the asymmetry between how getNamespaces behaves for SED classes and the children of NewXML, as well as how libSEDML differs from more general XML processing such as with LXML.

Here's how I believe libSEDML works:

For example, here's LXML behaves in the same scenario:

Suggestions for solutions

From my perspective, I think the most important thing is for getNamespaces to work the same for SED classes and for the child nodes of NewXML. Here's a couple of possible solutions:

  1. getNamespaces only returns the namespaces directly defined on the corresponding XML element. If developers want to get all of the namespaces that are applicable to a SED object, they can collect the namespaces of their parents -- it would be helpful to add a function for this so that developers don't need to be aware of the XML structure of SED objects.
  2. getNamespaces returns all namespaces defined on the corresponding XML element and all parent XML elements. In this case, when elements are serialized to XML, only namespaces that are not defined in a parent element need to be written out with the XML for the object. This avoids the roundtripping issue mentioned in the previous comment.

(1) is likely easier to implement. I think (2) is more consistent with other XML tools (at least LXML). My preference would be (2). But, (1) would work well if pared with a convenience method for collecting the namespaces from all parent XML elements. This convenience method is key for maintaining the abstraction between SED documents and their XML representation.

jonrkarr commented 3 years ago

I think SED-ML's reliance on XML (XPATHs and changes) is a smart design, at least for XML-based model languages. For this to work well, I think its important for SED-ML to behave consistently with XML and general XML tools. If this is the case, developers can expect that SED-ML works just like XML and minimal additional explanation is needed. Right now, libSED-ML and the SED-ML specs largely inherit from XML, but also deviate in a few key ways. This can generate a lot of confusion and break down the abstraction between models and simulation experiments. I think a few small changes are needed to more closely align SED-ML with XML and bolster the abstraction between models and simulation experiments.

fbergmann commented 3 years ago

There already is a function getElementNamespaces on the SED-ML classes, that returns only the namespaces defined on the element. getNamespaces is there to get the one from the parent, as usually we'd expect people to define namespaces there.

XMLNode construct is a lower level access (and provided by another library), so I would prefer not to have to change it. It is also used in other places (for annotation and notes), just like it is for the newXML element here.

jonrkarr commented 3 years ago

Thanks for pointing to the getElementNamespaces method. It sounds like this might work a little bit more like XMLNode.getNamespaces.

Still, I find the asymmetric treatment of SED-ML and non-SED-ML XML nodes surprising. From a XML perspective, I would expect them to behave the same. I think this divergence is important to avoid because it creates a need explain to developers how the interpretation of SED-ML departs from XML conventions.

To avoid changing libSEDML's data structures, likely an easy way to address my concern would be to provide a method which returns all namespaces defined on a element or on its parents. Ideally, this would work symmetrically for SED-ML and non-SED-ML XML nodes. This would allow developers to get the namespaces that one would expect to be available for each XML node (per XML conventions) without having to know the XML structure of SED-ML (e.g., what the parents of a child of NewXML are), which would break the abstraction.