What is the intended semantics of addXML, changeXML, and removeXML when targets match multiple elements and/or NewXML has multiple children?

SED-ML / sed-ml

Simulation Experiment Description Markup Language (SED-ML)

http://sed-ml.org

5 stars 2 forks source link

What is the intended semantics of addXML, changeXML, and removeXML when targets match multiple elements and/or NewXML has multiple children? #75

Closed jonrkarr closed 3 years ago

jonrkarr commented 3 years ago

Can you please clarify the intended semantics of addXML, changeXML, and removeXML? The semantics are not clear to me when the target XPATH matches multiple XML elements or NewXML has multiple children.

addXML

The L1V3 SED-ML specifications appear to suggest that target should match a single XML element, and NewXML should have a single child. If either is not true (target matches multiple elements or NewXML has multiple children), programs which interpret SED-ML should throw errors. This makes sense.

removeXML

The specifications suggest that target can match multiple XML elements, in which case each matching XML element should be deleted. This make sense.

changeXML

The meaning of changeXML is clear to me when the target matches a single XML element and NewXML has a single child.

The specifications and Listing 2.32 suggest that the target can match multiple XML elements and NewXML can have multiple children. The semantics of this are less clear to me.

One reason for my confusion is that libSED-ML doesn't seem to be able to capture multiple children of NewXML; it only captures the first child. I'm not sure if that's an error in the SED-ML specifications or with libSEDML.
Second, if an XPATH matches two children of a node, are both children intended to be replaced by NewXML (2 nodes are replaced by the number of children of NewXML) or should each child be replaced by NewXML (2 nodes are replaced by 2 X the number of children of NewXML).
Similarly, if an XPATH matches two XML elements that have distinct parents, should each element be replaced by the contents of NewXML?

Change attribute

Should the targets of changeAttribute be able to match multiple XML elements (i.e., the value attribute of multiple parameters), or should program raise errors if targets match 0 or multiple XML elements?

jonrkarr commented 3 years ago

I looked to guidance from other tools, but even tellurium doesn't implement addXML, changeXML, and removeXML. Does any active tool implement these?

luciansmith commented 3 years ago

I know of no tool that implements any of these features. My impression of all of them is that they all fall into the 'seemed like a good idea at the time' category, but are functionally unworkable. Not only this, but my impression from HARMONY/COMBINE conferences is that tool developers are discouraged from spending much/any time trying to implement these features, since they've received no uptake in the community. Anyone who did implement support for them at this point would probably be doing so out of a desire to see if the SED-ML spec was workable, rather than being pushed by user needs. I could be wrong about this! That's just my impression.

SBML has had a 'two implementation' policy for some time, requiring that any newly-added element to the specification be implemented by at least two tools before it can be incorporated into the spec. SED-ML had not had that policy in the past, so when these features were designed under a certain set of assumptions, those assumptions have turned out to not be super practical in the end.

jonrkarr commented 3 years ago

Add, change, and remove XML are all workable and fairly straightforward with a little extra guidance about how XPATHs should be used. That said, they all have the problem that they don't support non-XML models.

If there's no intention that tools implement this, why is it still in the specification? Their presence in the specification suggests to investigators that they should be able to use this, which is misleading. It also leads developers to think that supporting SED-ML requires more work than it does in practice. The specifications could be note these are experimental features that lack software support.

matthiaskoenig commented 3 years ago

I support some of the model changes and these are highly useful. Two primary examples are:

setting boundaryConditions=True for species (which allows to clamp a species) and can only be achieved with changes the XML
defining driving functions for the model. E.g. one has experimental data and wants to drive a model with a spline fit of the data. The FunctionDefinition for the experimental spline is then injected in the model and used to drive a parameter or species.

Also roadrunner recently added support for adding/removing species, reactions and assignment rules. There is just no SED-ML support in tellurium for these features. But I agree almost no tool handles the xpath things correctly.

On Tue, Jan 12, 2021 at 6:45 AM Jonathan Karr notifications@github.com wrote:

Add, change, and remove XML are all workable and fairly straightforward with a little extra guidance about how XPATHs should be used. That said, they all have the problem that they don't support non-XML models.

If there's no intention that tools implement this, why is it still in the specification? Their presence in the specification suggests to investigators that they should be able to use this, which is misleading. It also leads developers to think that supporting SED-ML requires more work than it does in practice. The specifications could be note these are experimental features that lack software support.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SED-ML/sed-ml/issues/75#issuecomment-758419298, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG33OWL2MUU6K46QABNVE3SZPOZHANCNFSM4V57JGKA .

-- Matthias König, PhD. Junior Group Leader LiSyM - Systems Medicine of the Liver Humboldt Universität zu Berlin, Institute of Biology, Institute for Theoretical Biology https://livermetabolism.com konigmatt@googlemail.com https://twitter.com/konigmatt https://github.com/matthiaskoenig Tel: +49 30 2093 98435

jonrkarr commented 3 years ago

Thanks for the info Matthias.

Seems like we've established the following:

This is useful -- I agree. Changing individual attributes is not always sufficient.
Few tools support this -- I can sort of change this. I can externally add the functionality to 9 tools, albeit reusing the same code.
That the ones that do likely don't support this consistently -- we can at least try to align on how it should work

In a related issue @fbergmann pointed out that an older version of libSED-ML has a concrete implementation of this. Without digging through the source code, its hard to understand how it works. @fbergmann could you summarize how this is intended to work?

fbergmann commented 3 years ago

So yes, back in the day both jlibsedml as well as my C# library supported all the features (the sed-ml web tools would implement support for it as application, and richard adams had a platform SBSI if i recall, that would do it as standalone application), and used xpathnavigation to resolve the elements. Later on people kept asking for a C++ version, and so i got started on that one. It was indeed a bug, that the code there did not allow for roundtripping of multiple elements. And it has been resolved thanks to jonathans bug report. (new version of the libraries are on pypi).

In general though, we wanted to allow for basic modification, without making things too hard. It was there, to make simple modification possible with the caveat that you would be manipulating XML trees and a malicious file could certainly break models.

In my implementation, I expected a single target to be hit. And i do agree, that the specification is unclear, what should happen for xpath expressions that target multiple ones. In general the agreement back then seemed to be to keep it as simple as possible.

addXML: the inner xml of the newXML node will be appended as child to that target (no matter whether it is one or multiple elements), so you could add multiple parameters to the list of parameters, or just one
changeXML: here the target node in the element tree, is replaced with the inner xml of the newXML element (again it does not matter if it is one or multiple elements there).

I'm happy to change the implementation (of the C# library) for targeting multiple nodes if that is useful and needed.

jonrkarr commented 3 years ago

(a) Toward broader support for these parts of SED-ML, I think it would be helpful for this to be implemented in the principle SED-ML libraries (i.e., the C++ one).

Model resolution: local files, internal references, identifiers (modelXchange), URLs
Changes to XML-encoded models

(b) Regarding the semantics of these changes, for consistency, I tried to match Frank's implementation.

Our default behavior is the following:

target must match a single object. If not, an error is thrown.
addXML and changeXML can add multiple objects.

The restriction that a target matches a single object isn't necessary, and I don't think this restriction is clearly articulated throughout the SED-ML specs. I think this restriction is also misleading. Since target=XPATH, I think many would expect that target can match multiple objects. For these reasons, our software has an option not to throw errors when targets match 0 or multiple objects. In such cases, an attribute can be changed on multiple objects, the same XML can be added or replaced on multiple objects, and multiple objects can be removed.

luciansmith commented 3 years ago

OK, so to sum up: where in the spec do things need to be clarified, if anywhere?

jonrkarr commented 3 years ago

I think the resolution is

NewXML can be a list of elements.
To be true to XPATHs, I think target should be able to match multiple elements. However, this may be inconsistent with jLibSED-ML (and in turn, iBioSim, VCell, and SBSCL). I'm still waiting on VCell to understand this better.

Both should be clarified in the text of the specifications.

luciansmith commented 3 years ago

Looking through the spec again, I found that many places already mentioned the possibility of multiple targets of an XPath, so clearly this was thought of from the beginning. I have now added a lot of explicit explanations of which elements need XPath targets to be single elements, single attributes, or can allow multiple of either.

One thing I noticed was that ComputeChange has weird semantic differences depending on whether you target the element vs. targeting an attribute. I added the following to try to explain:

"Note that if a ComputeChange targets an XML element, it is changing that object’s value in the model state, which would need to be initialized. If it targets a specific attribute, it is changing that attribute directly, which may affect the object’s value according to the specification of that modeling language.

For example: an SBML Parameter has a value attribute whose value is used as its initial value, unless there is also an InitialAssignment in the same model that targets that parameter--if it is, the value is ignored in favor of the InitialAssignment. If a ComputeChange points to the Parameter element, the computed value will be used in the initialized model state, overriding any InitialAssignment the model may have contained. If a ComputeChange points to the value attribute, it is just that attribute that is changed, meaning that when the model is initialized, the new value will be ignored in deference to the InitialAssignment. For this reason, the XML element itself is usually the correct target of a ComputeChange."

jonrkarr commented 3 years ago

(a) Agreed this is not specified clearly or consistently.

(b) While I think that its important for ChangeAttribute to encompass non-XML languages, I think ComputeChange can be restricted to XML-encoded model languages, at least for now.

(c) I think a good goal is to separate model changes from the interpretation of models for simulation. If we can do this, this means we could write a single (or a few) library that does model changes and each simulation tool can use this. Both jLibSED-ML and BioSimulators are structured this way. I believe the old libSED-ML worked this way as well. I think this is the most realistic path to broader adoption of SED-ML, and its inline with the fact that most of the few tools which support SED-ML only support a part of the format.

(d) Toward this end, my preference is to say that ComputeChange doesn't operate on "initialized" state. Rather, it operates on model descriptions (e.g., XML in the case of CellML and SBML), and it must set attributes. I believe this is consistent with the SED-ML/CellML examples. I haven't seen any SBML examples where this couldn't work. Setting of an entire object can be done with ReplaceXML.

(Of course simulation tools are free to implement this as they like; I'm speaking about what tools would conceptually do, rather than about how tools should implement things.)

(e) Having targets be XML objects rather than attributes has the problem that it doesn't provide a clear mechanism to set specific attributes. This might be ok for kinetic SBML models, but is problematic for other kinds of models where objects have multiple properties. I think a better solution is for targets to point to specific attributes.

luciansmith commented 3 years ago

Wow, using changeAttribute on non-XML languages seems terrible to me. Surely the 'attribute' in question is an XML attribute, not some general characteristic of a symbol?

Your goal of ComputeChange only working for XML is, unfortunately, impossible. The XML only defines the initial state of the model, but ComputeChange can be used in SED-ML to set the current value of an element of a model state (such as between repeats in a repeated task, or (now) between subtasks). This is not encoded in the XML at all; it's an ephemeral quality of a simulation. It will absolutely be impossible to write a general library to do this; it is unentanglable from the model definition language. I agree it would have been nice, though.

For e: I am absolutely not saying that you cannot use ComputeChange to target an attribute. Clearly, you must be able to do so to set a wide variety of things. But when you do not, and you target the element instead, you are computing and setting a new mathematical meaning for the element in question.

jonrkarr commented 3 years ago

(a) ComputeChange and SetValue are distinct. The former is associated to models. It doesn't have any direct connection to simulations. Thus, I find it awkward to suggest that compute change has to operate on simulation state when it isn't directly attached to a simulation.

(b) In my opinion, SED-ML is very unclear about what is supposed to be computed in the empheral simulation state and what can be computed solely on model descriptions. SED-ML is extra unclear about what such empheral simulation states are.

(c) I think SED-ML needs to be clearer about what are conventions that apply to specific model languages. Part of what you've outlined is implicitly for SBML(core) and CellML -- using XPath for XML model elements to indicate the canonical values of the corresponding simulation objects. Other languages can't or have chosen not to use this convention.

luciansmith commented 3 years ago

OK, sure, ComputeChange and SetValue are technically different, but SetValue inherits from ComputeChange and adds two attributes (range and model), so they're pretty similar. In writing in the spec about it, I wanted to talk about both.

You are absolutely correct that XPath is being used in both to refer to things that are not actually XML. It's yet another argument against using XPath, but that horse left the barn a long time ago, before dying of old age.

I see what you mean about minFlux and maxFlux in FBC. However! I think we can solve this with our new KiSAO terms! All we have to do is do the same thing we did with 'concentration' and 'amount' with species:

    <variable id="J0min" symbol="KISAO:0000xxx" taskReference="task0" modelReference="model0"
         target="/sbml:sbml/sbml:model/sbml:listOfReactions/sbml:reaction[@id=&apos;J0&apos;]" />

where 'symbol' points to the KiSAO term for 'minimum flux' or even just 'minimum'.

The convention would become:

For all ComputeChange and SetValue elements:

If the target is an attribute, the calculated value is applied to that attribute in the XML.
If the target is an element, the calculated value is applied to the mathematical meaning of the associated element within the current model state.
If the target is an element, but the associated element has multiple potential mathematical meanings, use the 'symbol' attribute to clarify which one is meant with a KiSAO term.

That should work for CellML and all SBML package I know of--do you think it would also work for NeuroML?

As a separate issue, I can see how it would be sort of nice to only have to deal with XML directly for ComputeChange (and not SetValue) elements. But from my perspective, we already have to deal with SetValue elements, so I don't see much point in blocking that functionality simply because we happen to be at the beginning. I'm happy to be overruled; it's just my general sense. Anyone else have an opinion about this aspect of things?

jonrkarr commented 3 years ago

While ComputeChange and SetValue are similar, the former are not specifically associated to simulations. I don't see anything in the SED-ML specs which clearly communicates that SetValue can't operate on model descriptions rather than emphemeral simulation state either. I realize that tellurium may be applying such changes to emphemeral simulation state, but I don't see anything that says it has to be done this way. jLibSED-ML and downstream also work with changes separate from emphemeral simulation state. Another reason why I interpret changes to apply to model configurations rather than simulation state are the AddXML, RemoveXML, and ChangeXML constructs. These seem to be ill-defined in simulation state. I realize tellurium ignores them, but I think all of the types of defined changes need to be handled consistently.

I see what you mean about minFlux and maxFlux in FBC. However! I think we can solve this with our new KiSAO terms! All we have to do is do the same thing we did with 'concentration' and 'amount' with species:

I added KiSAO terms for this in v2.18 along with the terms you added for Jacobian, etc. We've been handling this differently for FBC; we'll transition to the new target+symbol strategy.

If the target is an element, the calculated value is applied to the mathematical meaning of the associated element within the current model state.

(a) At the least, this should say that the value is applied to the associated model element per the convention for using SED-ML with the associated model language and simulation algorithm. This then needs guides for using SED-ML with each model language. (b) Related to the first paragraph above, I don't think "current model state" is clearly defined in SED-ML. What is clearly defined are model configurations. Thus, my preference would be to not allow this.

If the target is an element, but the associated element has multiple potential mathematical meanings, use the 'symbol' attribute to clarify which one is meant with a KiSAO term.

This conflicts with a previous issue in which symbols are stated to only be used with read only properties. My preference is for this not to be possible.

That should work for CellML and all SBML package I know of--do you think it would also work for NeuroML?

(a) I don't think changes have really been clear with CellML because its never been supported. OpenCOR has minimal support for SED-ML. (b) Changes to CellML, LEMS/NeuroML, and other formats can be handled at the XML level as outlined above, precisely the same way they can be handled for SBML. Changes to LEMS/NeuroML on the implied simulation state has not been defined. If it were defined, I think they would likely want to use the ids of model elements rather than XPaths, similar to what they've done for targets of variables of data generators. I believe changes to VCML files will also be implemented to operate on VCML rather than implied simulation state. They're doing it this way so they can use jLibSED-ML to apply changes for SBML and VCML.

luciansmith commented 3 years ago

Oh, dear, I hadn't even thought about trying to delete a species or add a reaction halfway through a simulation. I guess this is currently not allowed in SED-ML, since we only allow SetValue once a simulation has started.

I'm starting to lose the plot here as far as 'what do we say in the spec'. Here's what I would like to do:

Claim that when ComputeChange or SetValue targets an attribute, it is the exact definition of the attribute that is changed.
Claim that when ComputeChange or SetValue targets an element, it is the mathematical meaning of the element that is changed, if allowed, potentially modified by the 'symbol' if multiple options are present. (I know you would like to disallow this, but IIRC, @matthiaskoenig wanted this functionality specifically for his work, and is the whole reason 'concentration' and 'amount' were added in the first place.)
Be more explicit about the model state during simulations, and how SetValue sets the current ('ephemeral') value of the target, and that the XPath is merely a shorthand way of pointing to something that is no longer XML.

Other side notes:

I don't see anything in the SED-ML specs which clearly communicates that SetValue can't operate on model descriptions rather than emphemeral simulation state either.

No, SetValue can operate on model descriptions. It's not super useful for things like 'initialConcentration', since the model state is not in its initial state any more, but potentially useful for other things like 'boundaryCondition' (I guess? I don't know if anyone has actual support for that.)

I don't think "current model state" is clearly defined in SED-ML. What is clearly defined are model configurations. Thus, my preference would be to not allow this.

This is what SetValue does. It's the main thing that SetValue does! If it's not defined well in the spec, that's fair and we should define it better. But we cannot disallow it.

Changes to LEMS/NeuroML on the implied simulation state has not been defined. If it were defined, I think they would likely want to use the ids of model elements rather than XPaths, similar to what they've done for targets of variables of data generators.

Man, if that were an option, I would have wanted to use it for SBML and CellML and everything else. Is it too late to change?

luciansmith commented 3 years ago

One other thing I realized when going back to edit the spec: the other reason that ComputeChange might require initialization is if there's any Variable that points to a model variable. If the value of (say) S2 is 'S_tot - S1', you'll need to initialize that model to figure out what that value is.

jonrkarr commented 3 years ago

I started this thread to clarify how many XML elements a target can match. I think the resolution is that this should behave consistently with XML XPaths. If a target matches zero elements, thats ok (because XPaths can match 0 elements). If a target matches multiple elements, each element/attribute should be changed/replaced.

Note, that targets should variables are different. Because they can be used in computations, targets for variables need to match a single XML object/attribute.

Perhaps we should leave the clarification of model descriptions vs simulation state to L2. It seems to me that there's significant divergence in the interpretation, which in principle is significant. At the same time, the majority of existing SED-ML files (e.g. BioModels, JWS Online), don't use anything that impacts this since they only use the most basic features. As a result, the few existing software implementations are likely close to consistent for the majority of existing SED-ML files. On top this, it would probably be better to focus our limited development energies to L2 rather than perfecting the alignment for L1.

luciansmith commented 3 years ago

The number of XML elements has indeed been clarified as part of this change; the other bits were things I realized needed to be updated along the way.

I changed the ComputeChange bit to the following:

"The change is calculated from the Math of the Calculation, and applied to the target of the Change. In this context, a target that points to an XML element (either the target of the ComputeChange or a target of a child Variable) is referencing that element’s mathematical meaning. For some model languages (such as SBML), this means that the model state must be initialized, so the element value can be read (in the case of a Variable) or changed (in the case of a ComputeChange).

In contrast, a target that points to an XML attribute simply is referencing that attribute’s value, which may be read or set directly in the XML document without initializing the whole model."

jonrkarr commented 3 years ago

This seems fine. It implicitly recognizes that there are two somewhat conflicting interpretations of how to process changes to XML models. Ideally, this would be resolved, but I think it could be punted to L2.

luciansmith commented 3 years ago

OK! Closing this issue, then. I'm glad we had the discussion; it clarified a great number of things for me.