Redesign model changes and variable targets to address fundamental flaws with XML XPATHs, changes, and namespaces

SED-ML / sed-ml

Simulation Experiment Description Markup Language (SED-ML)

http://sed-ml.org

5 stars 2 forks source link

Redesign model changes and variable targets to address fundamental flaws with XML XPATHs, changes, and namespaces #114

Open jonrkarr opened 3 years ago

jonrkarr commented 3 years ago

The specifications suggest that model files should define namespaces with prefixes:

xmlns:sbml=‘http://www.sbml.org/sbml/level3/version1/core’

such that XPATHs such as below are well-defined

/sbml:sbml/sbml:model/sbml:listOfSpecies/sbml:species[@id=‘X’]

However, many model files define namespaces without prefixes such as

xmlns=‘http://www.sbml.org/sbml/level3/version1/core’

There's no problems with doing this. The namespace is defined, just without a prefix (shortcut). In this case, XPATHs such as above are actually not valid because the prefix for SBML is null (LXML for example uses None). Rather, the following would be a good XPATH:

/sbml/model/listOfSpecies/species[@id=‘X’]

I suggest making one of a couple of changes to remedy this divergence from XML:

Require model files to define prefixes. This would fix some issues, but there's no reason that model files have to do this and, therefore no reason to expect tools will always do this.
Edit the specifications to acknowledge that XPATHs such as /sbml/model/listOfSpecies/species[@id=‘X’] are correct when no prefix is defined.
Define a convention to match default prefixes (e.g., sbml) to the URIs for the default namespaces (http://www.sbml.org/sbml/level3/version1/core)

nickerso commented 3 years ago

I agree that namespaces in XPath need clarity in the spec, but I don't think there is actually anything wrong. The XPath expressions in the SED-ML need to have their namespaces resolved in the context of the document in which they are located (i.e., the SED-ML document) not the target document (i.e., the SBML document). So all the namespaces used in the XPath expressions need to be defined in the SED-ML document - having an XPath like /sbml:sbml/sbml:model/sbml:listOfSpecies/sbml:species[@id=‘X’] is fine even if the sbml namespace prefix is not defined in the SBML document, but it must be defined in the SED-ML document.

You can see some discussion on how this is handled in libSEDML here: https://github.com/fbergmann/libSEDML/issues/77; including code on how to grab all the namespaces from SED-ML elements

jonrkarr commented 3 years ago

Thanks for clarifying! Now that you mentioned this, I see the note in the specifications (bottom of L1V3 Section 3.3). I missed this. This makes sense. This is that same as my last suggestion above.

This is easy to miss because only one of the examples (the CellML example) follows this convention. All of the SBML examples in the specifications PDF don't follow this. The examples I spot checked at http://sed-ml.org/examples.html don't either. This convinced me that tools that are interpreting these changes are rely on implicit SED-ML/model language-specific conventions. In this case, I think most of the examples that use XPATHs should be considered invalid because the namespace prefixes for XPATHs are not defined.

Thus I suggest the following actions:

[ ] Add a more complete example to Section 3.3
[ ] Add missing model language namespaces to examples in the specifications
[ ] Add missing model language namespaces to examples at http://sed-ml.org/examples.html (https://github.com/SED-ML/sed-ml.github.io/tree/master/examples)
[ ] Encourage BioModels, JWS, and other repositories to fix their SED-ML files

jonrkarr commented 3 years ago

Likely every SED-ML file in BioModels has the same problem. BioModels generally seems to use COPASI to create SED-ML files, which doesn't handle XPATHs as intended.

The SED-ML files available from JWS Online also have this problem.

Unfortunately, this problem is widespread, likely affecting the great majority of extant SED-ML files.

luciansmith commented 3 years ago

This definitely goes against the original spirit of SED-ML, but in my opinion, I think at this point we should just say that SED-ML has never actually used real XPath values, codify whatever rough heuristic tools actually use in the spec for L1v4, and then drop XPaths and XML editing entirely for level 2, in favor of a new more semantic set of 'model change' rules. I think at this point it's clear that nobody ever used the XML editing rules in SED-ML in the way they were intended. To me, this seems to be evidence that the proposed solution did not fit the problem as well as it seemed at the time.

jonrkarr commented 3 years ago

Thanks @luciansmith for suggesting this! I wanted to suggest this too. I hesitated because this would be a significant conceptual departure.

I think something definitely needs to be done. The current situation has significant fundamental flaws.

I would be in favor of something like Lucian's proposal. This could be significantly easier to implement and put non-XML model languages on equal footing.

Devil's advocate

To play devil's advocate against my own suggestion, the current vision could be made to work. I actually don't think the specifications need a lot of change. But, a lot of files would need to be fixed.

The bigger problem is that the current vision is unnecessarily complicated for software tools to implement. This seems to have led to the current situation that almost no tool implements it as intended and, worse, there's insufficient consistency between tools. This creates downstream problem such as the needed to keep track of which of SED-ML are supported by each tool. Similarly, this is significant barrier to additional tools supporting SED-ML.

One way to get around some of the bottleneck is to provide a higher level library which takes care of much of working with changes and variables. This is what we've done with BioSimulators utils. We've used it to integrate SED-ML with 9 simulation tools. This makes working with SED-ML simpler for developers, but using this is still unnecessarily complicated because its constrained by fundamental flaws in the design of SED-ML. Plus, this creates the need to have versions of the library for every programming language that developers want to use.

jonrkarr commented 3 years ago

I edited the title issue to align with Lucian's proposal.

jonrkarr commented 3 years ago

As datapoints, VCell (due to jlibSED-ML) can't parse extra definitions of namespaces, e.g., for SBML. This indicates that iBioSim which also uses jLibSED-ML can't either. This suggests that jLibSED-ML also can't handle namespaces in conjunction with targets and likely not with NewXML. It seems the BioSimulators tools and may be OpenCOR are the only ones that deal with namespaces.