SED-ML / sed-ml

Simulation Experiment Description Markup Language (SED-ML)
http://sed-ml.org
5 stars 2 forks source link

How to use variables in comp submodels in DataGenerators (model composition)? #44

Open matthiaskoenig opened 7 years ago

matthiaskoenig commented 7 years ago

Issue

Currently it is not possible to access variables in models based on model composition. This is a general problem of composed models where part of the model XML definition is in external files which is than only referenced by the top model (same issue for CellML model composition).

Examples

In the model definition in SED-ML I reference the top comp model. I can now only use variables in DataGenerators which have been made accessible in the top model via replacements or replacedBy. But I want to plot a variable from one of the submodels which is in the ExternalModelDefinitions of the Top model.

I tried this with phrasedml, but get the following error

Error:  an output plot or report references variable 'fba__R1' which cannot be found in task 'task1's model 'model1'.

This makes sense because the variable does not exist, but would just be called 'fba__R1' if the model is flattened.

My question is: How can I use variables from ExternalModelDefinitions in DataGenerators? What is the syntax to reference these? Also what about the XPath expressions, because the xml of the ExternalModelDefinitions is not part of the referenced model file.

Proposals

There have been two main options discussed, both of which involve XPaths, since that's how we refer to variables in SEDML, for better or worse. It would either be:

Adding submodel definitions to refer to submodels https://docs.google.com/drawings/d/1gne9K3phthaUs-k_Q9g1HiFpkOw-EHzgIFj-J3DJfvw/edit

Extending Variable https://docs.google.com/drawings/d/141skp9tVbSB1J-8qhdI3Grz_NPzaf7dV6XkVRUKH0wU/edit (for Lucian's version)

matthiaskoenig commented 7 years ago

Example Combine Archive attached (renamed to zip to upload): toy_wholecell.omex.zip

I want to access the reaction R1 in the submodel fba

    p = """
          model1 = model "toy_wholecell_top"  # top model
          sim1 = simulate uniform(0, 50, 500)
          task1 = run sim1 on model1
          plot "Figure 1: DFBA species vs. time" time vs A, C, D
          plot "Figure 2: DFBA fluxes vs. time" time vs fba__R1  # here a reaction from FBA submodel   
    """
matthiaskoenig commented 7 years ago

Probably we have to extend ModelDefinition by recursive submodels, which than can be used whereever the top model definition could be used

<listOfModels>
  <model id="model1" language="urn:sedml:language:sbml.level-3.version-1" source="toy_wholecell_top.xml">
    <listOfSubModels>
    <submodel id="model1_fba" language="urn:sedml:language:sbml.level-3.version-1" source="toy_wholecell_top.xml"/>
    <submodel id="model1_demo" language="urn:sedml:language:sbml.level-3.version-1" source="toy_wholecell_top.xml">
      <listOfSubModels>
      <submodel id="model1_demo_part" language="urn:sedml:language:sbml.level-3.version-1" source="toy_wholecell_top.xml"/>
      <listOfSubModels>
    </submodel>
    </listOfSubModels>
  </model>
</listOfModels>
luciansmith commented 7 years ago

Yes, this has always bugged me. It's always been true of CellML, too!

It's also an issue that you can't set an instance of a submodel variable without setting all instances of that submodel variable. But that's probably an issue for another time.

I would probably solve this by adding an optional child to Variable, so that if the 'target' was a submodel, you could have another 'target' attribute pointing to the child object in the submodel you wanted to point to (and that child would also have an optional child):

https://docs.google.com/drawings/d/141skp9tVbSB1J-8qhdI3Grz_NPzaf7dV6XkVRUKH0wU/edit

matthiaskoenig commented 7 years ago

Hi Lucian, Here working link: https://docs.google.com/drawings/d/141skp9tVbSB1J-8qhdI3Grz_NPzaf7dV6XkVRUKH0wU/edit

I don't understand why we would need an additional subvariable. One could just point in the variable to the submodel, i.e.

modelReference = subModelId

the target of the variable would already be the XPath within the submodel.

luciansmith commented 7 years ago

Right; either one would work. I just think the Variable is the more natural place to put it, since then when you need a variable, you define it right there; you don't need to go back and define the submodel it's a part of in a different section of the file. It would just make it slightly easier to implement and use, I think.

(For phrasedml, it could look for the '__' part of the variable name and use that to divvy up the variable name--it could use either scheme to do it, though.)

matthiaskoenig commented 7 years ago

I like more a global approach. I.e. you can just add a submodel to the model and than all the symbols of the submodel are accessible from anywhere else in the SED-ML document with the same variable formalism. Than I don't need code to handle variables in submodels different than in regular models.

One could even handle it implicit by saying: if I define a model, all submodels of the model are part of the model definition and it's symbols can be accessed via the following syntax in variables model$submodel_id$subsubmodel_id$symbol_id I.e. not changing anything, but just making clear what it means to have a model with submodels (or model which imports/includes other models) and how its symbols can be accessed. Personally I like this the most, because it does not change anything, but by some clarification one could handle this additional case.

I don't like __, but it should be some character which is not allowed in SIds, so that parsers can handle the submodel ids easily.

nickerso commented 7 years ago

Just to say that this discussion should really be on sed-ml-discuss not buried on a github issue :)

Second, there are really two issues here. The first is whether this is even an issue that SED-ML needs to address - if a model format has a way to abstract or hide data then do we need to provide a mechanism in SED-ML for getting around such abstraction? In CellML, for example, it is trivial for a modeller to expose anything they wish to be accessible in a given simulation experiment. The counter argument to this is that when you want to perform a simulation experiment using a read-only model that you can't change, it would be nice to put out plots or set parameters that have been hidden behind a hierarchical abstraction without needing to make your own copy of the model hierarchy.

The second issue is how to identify deeply nested entities in a hierarchical model. Relying on identifiers is problematic as it doesn't address the need of working with read-only model definitions (unless all supported model formats require document unique identifiers everywhere, and they all use the same definition of an SId type). But then the XPath that we currently use has the same issues and most model formats have no requirement for element ordering and the same model can be serialised any number of ways by different tools - although @luciansmith's suggestion for nesting variable XPath expressions would work for an arbitrary depth hierarchy, albeit opening things up to error when targeting an entity in a child model that is not actually used in the parent (possibly a difference between CellML and SBML model composition). @jonc125's original suggestion of using ontology terms still seems the cleanest to me, but relies on models being annotated against consistent ontologies - which is the current focus of the COMBINEd annotation working group...

So not much help from this end, other than I would really not want to see __ used to imply anything!

matthiaskoenig commented 7 years ago

The first is whether this is even an issue that SED-ML needs to address - if a model format has a way to abstract or hide data then do we need to provide a mechanism in SED-ML for getting around such abstraction? In CellML, for example, it is trivial for a modeller to expose anything they wish to be accessible in a given simulation experiment.

A simulation experiment should be able to access all information in a model, independently of if the model builder wanted to expose them. Also in SBML one could easily just make ports and replacements to all variables in the submodels and export them to the top model, but this would allow only to use the subset of variables in SED-ML the modeller intended to export. This is not a solution.

it would be nice to put out plots or set parameters that have been hidden behind a hierarchical abstraction without needing to make your own copy of the model hierarchy. Yes this would be great. I.e. just allow a new syntax to access variables/content in nested models without adding the submodels. The problem I see with this is that for a SED-ML implementation one has to write code to resolve the individual submodel files based on the top model information, whereas just listing the submodels with their respective sources would make things very clear and explicit. One can just reuse all the code from model to also parse the submodels and apply the XMLChanges. Normally you write SED-ML programatically, in case of hierarchical models these submodels are just written programatically, so I don't see an issue with listing them.

The second issue is how to identify deeply nested entities in a hierarchical model. Relying on identifiers is problematic as it doesn't address the need of working with read-only model definitions (unless all supported model formats require document unique identifiers everywhere, and they all use the same definition of an SId type)

I don't understand that. You would just define a submodel and give a unique SId for the referenced submodel. If the model is read only or not does not matter.

XPath expressions would work for an arbitrary depth hierarchy, albeit opening things up to error when targeting an entity in a child model that is not actually used in the parent

Please no nested XPath, xpath are bad as they are, nesting them will be the end of it. If you have a submodel you can just use the XPath within this submodel. No need to change anything. Personally I think the easiest, backward solution with minimal implementation is

Ontologies don't work: nobody is annotating models (this will not get better with hierarchical models where you want to use read-only models and models in repositories which are not/incorrect annotated. We need a solution which can point to the XML file of the submodel and can use the model as is.

jonc125 commented 7 years ago

The ontology approach is more relevant for tools like our Web Lab which use the same experiment description against multiple models; SED-ML already hard-codes the model so hard-coding variables within the model doesn't restrict you.

I agree that you need an approach for referencing any entity within a (sub)model whether or not the modeller has exposed it or given it any kind of unique ID, and XPath (for XML-based modelling languages at least) is probably the best tool for this job.

One thing to be careful of is that your referencing mechanism can handle the same submodel definition being instantiated multiple times within the top model, and hence actually being 2 (or more) different models - you need to be able to reference just the one you want. Which means you can't reference the file the submodel is defined within, you need to reference the location where it is instantiated in the parent model. (So I don't think @matthiaskoenig's example above quite works.) This is going to require code within implementations that understands the different modelling languages' ways of using submodels. But that would probably be the case whatever approach we take!

I'm not sure whether it's better to define a list of submodels you care about, or just add extra children to the Variable as Lucian suggests. The former is probably easier to implement if you can treat a submodel in (mostly) the same way as you'd treat a 'top' model for extracting/setting variables.

Definitely don't use special symbols in names within the SED-ML though!

matthiaskoenig commented 7 years ago

@jonc125 Very good point about the instantiated submodels. Did not think about this.

The XML on which SED-ML would operate is the XML of the instantiated submodel, e.g., for SBML after deletions and replacements. One needs an attribute which defines which instance of the submodel is meant. For example in case of SBML the instance_id would be the ModelDefinition id in the top SBML model. I have no idea how cellml is handling multiple instances of different submodels, but I am sure they also give somewhere something like an instance_id to the instantiated submodel.

<listOfModels>
  <model id="model1" language="urn:sedml:language:sbml.level-3.version-1" source="toy_wholecell_top.xml">
    <listOfSubModels>
    <submodel id="model1_fba1" language="urn:sedml:language:sbml.level-3.version-1" source="toy_submodel1.xml" instance="instance1"/>
    <submodel id="model1_fba2" language="urn:sedml:language:sbml.level-3.version-1" source="toy_submodel1.xml" instance="instance2"/>
    </listOfSubModels>
  </model>
</listOfModels>
luciansmith commented 7 years ago

As far as I can tell, there are two options here, both of which involve XPaths, since that's how we refer to variables in SEDML, for better or worse. It would either be:

https://docs.google.com/drawings/d/1gne9K3phthaUs-k_Q9g1HiFpkOw-EHzgIFj-J3DJfvw/edit (for Matthias's version) or

https://docs.google.com/drawings/d/141skp9tVbSB1J-8qhdI3Grz_NPzaf7dV6XkVRUKH0wU/edit (for Lucian's version)

nickerso commented 7 years ago

An instance "id" still wouldn't help for the case where you are using models that don't have such id's defined in them and you don't have permission to edit the source model(s) (i.e., read-only, such as referring to a model in a repository somewhere). CellML imports, for example, don't require an id to be present.

I think @luciansmith accurately captures the two feasible solutions proposed so far, using XPath to refer to the specific instance of the import/ModelDefinition/etc in an arbitrarily deep hierarchical model. I'd suggest sending an email to the sed-ml-discussion list outlining the issue and the proposed two solutions asking for feedback, alternative proposals, etc. and then see if there is a clear preference for one solution over the others.

luciansmith commented 7 years ago

David's plan seems good to me. I'm happy to write up the proposal if you like, or Matthias can do it.

A couple clarifications:

nickerso commented 7 years ago

I don't think it should impact what SED-ML should do, but in CellML you import components from a source model, not the entire model. So you could end up referencing a variable in a submodel that is not actually "in" the model being simulated. But this falls into the same category as what SED-ML should do with all the current XPath expressions when they reference something that makes no sense.

matthiaskoenig commented 7 years ago

I still did not understand how the Subvariable would work:

For instance in SBML how would I get the XML of the instantiated submodel (if the ModelDefinition is in an external model file). Is there something like getInstantiatedXML, i.e. the XML of the submodel after replacements and deletions were applied? Without having a way to get the instantiated XML a xpath is not helping.

luciansmith commented 7 years ago

In SBML: The target Xpath of Variable points at the Submodel.

The target Xpath of the subVariable either points at the Sub-submodel, or at the variable you want.

Finding the original XML for the Model that was instantiated as a Submodel is the job of the interpreter: the process is very different in SBML vs. CellML (and, presumably, vs. any other arbitrary nested XML language).

If it helps, by analogy, if you have the Antimony:

model foo() a=3 end

model bar() S: foo() end

you have model 'bar' and you want variable 'S.a'. Variable points at S, since S is the variable in bar that holds the entity you want. The Subvariable holds the XPath to 'a' inside 'foo'. It is the interpreter's job to find 'foo' based on 'S': it might be in the same file, or a different file, or whatever. The key bit is that the model definition language itself defines how to find 'foo' (it's quite different in CellML vs. SBML). It would also be the job of the modeler to make sure that 'a' wasn't deleted in Submodel S1, and that it existed in the first place, etc. But it doesn't matter if other things have been deleted or replaced; there will always be an 'a' inside 'foo' that is the entity in question that you want, and 'foo' will exist in some document somewhere, and you can point an XPath at it.

You can't just point directly at 'a' from 'foo' because you might instantiate foo several times:

model baz() S1: foo() S2: foo() S3: bar() end

baz actually contains three different references to 'a': S1.a, S2.a, and S3.S.a You'd find all three with slightly different SED-ML:

Variable1: target: [xpath to S1] subvariable: target [xpath to a in foo]

Variable2: target: [xpath to S2] subvariable: target [xpath to a in foo]

Variable3: target: [xpath to S3] subvariable: target [xpath to S in bar] subvariable: target [xpath to a in foo]

Does that make more sense?

You could do essentially the same thing by encoding baz.S1, baz.S2, and baz.S3.S as 'submodels' in your model definition list (your proposal), and using them as the 'model' target of your Variable.

I think I like attaching the information to Variable instead, though, because the submodels themselves feel like variables to me, and the code I'm envisioning to read and write them is a little more straightforward in the Variable approach. But the stored information is exactly the same.

matthiaskoenig commented 7 years ago

Makes more sense now, thanks for the explanation.

One thing I don't understand is how to get the XML of the instantiated submodel in SBML. For an implementation working with xpath I need a xml representation of this instantiated model, i.e. with deletions and replacements applied to it.

submodel.toXML() or submodel.toSBML() will just give me the XML of the uninstantiated model, i.e. something with a listOfDeletions, but not an XML where these things are actually deleted, same with replacements. If there is no way to get the instantiated XML in libsbml than one cannot apply xpath expressions

luciansmith commented 7 years ago

Don't worry about the instantiated XML--such a thing is conceptual only, and doesn't really exist. The XPath should be for the template XML.

You don't need to worry about deletions and replacements, since that's a modeler issue: it's illegal to reference an element that's been deleted. Replaced elements are an SBML thing that we could talk about--I think that ideally, if 'a' in the top-level model replaces or is replaced by submodel variable S.a, either 'a' or 'S.a' should now refer to the same thing, but this might be difficult for some interpreters to pull off (especially if they flatten). But that's along the lines of 'best practices for SBML use in SED-ML', and shouldn't unduly affect the SED-ML spec itself.

fbergmann commented 7 years ago

I don't know about all this, if libSBML does not offer any way of retrieving the submodel xml, then what should the xpath be run over? It really would be much easier to bubble up potential model elements to the main model and use them from there. Otherwise it seems like very specific processing instructions have to be provided on how to get to the element.

Which seems counter the original spirit of sedml, to be modelling language agnostic.

luciansmith commented 7 years ago

Libsbml does offer a way of retrieving the submodel XML: the template Model or ModelDefinition. Similarly, CellML will have access to the imported and will know what document it's in. That's where you point your XPath: in SBML, you point it at (say) the <parameter> child of the <modelDefinition>; and in CellML you point it at the child of the <component>. SED-ML doesn't need to talk about any of that; it just says 'find the XML of the imported bit, and point the XPath to the piece of the imported thing you're talking about'.

matthiaskoenig commented 7 years ago

@luciansmith I still don't understand it. I have the feeling I need the instantiated XML. Otherwise how would I handle

In general I have the same implementation concern like Frank. If I don't have the XML over what should I run the xpath?

luciansmith commented 6 years ago

I see this issue didn't come up at HARMONY, but it seems like it's worth it to try to explain again.

My Antimony-ish example above is still pretty much as clear as I can make it, but let me try a different way.

In SBML-comp, we define rules for making a model out of other models. 'Here is template model foo', it says, 'it has an 'a' in it. Here is template model bar; it has a foo in it called 'S'. Here is your final model 'baz'; it has two foo's named S1 and S2, and a 'bar', named S3.'

The rules are all you need to create an instantiated model.

Similarly, if we gave SED-ML a 'SubVariable' child of 'Variable' (https://docs.google.com/drawings/d/141skp9tVbSB1J-8qhdI3Grz_NPzaf7dV6XkVRUKH0wU/edit), it could reference things the same way. "Your model is bar, and your Variable is S1', it would say. But, since S1 is an entire submodel, it then adds a child SubVariable to it that points to the 'a' parameter inside model template 'foo'. The XPath just points to 'a', but we know what instantiation we're talking about because the parent Variable pointed to 'S1', which is a particular instantiation of foo.

So, to reference all three instantiated 'a''s in the model 'baz' (above):

S1.a:

<variable model="baz" target="[xpath to S1]">
     <subVariable target="[xpath to a]" />
</variable>

S2.a:

<variable model="baz" target="[xpath to S2]">
    <subVariable target="[xpath to a]" />
</variable>

S3.S.a:

<variable model="baz" target="[xpath to S3]">
    <subVariable target="[xpath to S1]">
        <subVariable target="[xpath to a]" />
    </subVariable>
</variable>

All [xpath to a] Xpaths are the same! The two [xpath to S1]'s are also the same! It doesn't matter that there is no instantiated XML; everything is right there in the rules, and the SED-ML follows the same pattern.

luciansmith commented 3 years ago

Removing the L1v4 tag from this issue, as it's clearly not going to get resolved any time soon.

However, at this point, given that @jonrkarr has developed non-Xpath, language-specific 'target' templates for finding model variables, the way I would do this today is to develop a simple string format for SBML for a Variable 'target': just use the id.id format. The nice thing is that it would work for arbitrary nested IDs, both submodel variables and reaction local variables.

jonrkarr commented 3 years ago

A similar issue comes up with imported content in other languages.

NeuroML's URI strategy navigates this nicely. Arbitrarily deep content has clear addresses.

My understanding from the CellML team is that imported content is externally unaddressable. The only way to achieve something similar is to map imported content to a top-level element.

XPaths provide not clear path forward for this. SBML could follow a approach more similar to NeuroML. This would have the added advantage of making it easier for simulation tools to support SED-ML.

luciansmith commented 3 years ago

Do you have a link to NeuroML's URI strategy? That sounds promising. (Apologies if you already posted a link elsewhere and I missed it.)

jonrkarr commented 3 years ago

I have an example here: https://github.com/biosimulators/Biosimulators_test_suite/tree/dev/examples. Another reference is using jNeuroML/pyNeuroML to export SED-ML from LEMS (container for multiple NeuroML documents).

I'm not aware of any documentation. Its similar to CellML URIs: chain of ids of model components with / to separate between layers and [index] to index into arrays.

Note, jNeuroML/pyNeuroML and downstream cannot import SED-ML. This functionality is added in the BioSimulators interface to those tools.

If Padraig or others want to talk, I'll include you.

I see no reason why SBML couldn't embrace something similar. One small consequence, which I think is mostly good, is that it would push away from AddXML, RemoveXML, ChangeXML toward semantically meaningful model changes. The upside is that the semantic would be clearer, it would likely be easier for simulations to support this. The downside is that more coordination with model languages and more classes would be necessary to represent this. One solution would be to put those classes in the model languages rather than in SED-ML. From a modularity/design standpoint, I think this would work well. However, from speaking to people I already know some people would push back against this because it would mix operations (changes) of models into model descriptions.

luciansmith commented 3 years ago

For the record: here's a variable from a NeuroML DataGenerator:

<variable id="n" target="hhpop[0]/bioPhys1/membraneProperties/KConductances/KConductance/n/q" taskReference="task"/>

Given that they're callng the variable 'n', I'm guessing that 'q' is some numerical quantity of n? At any rate, I think some scheme for SBML that uses slashes or colons would be great, like

    sub1:J0.k1
    sub1/J0/k1

for 'the local variable k1 in reaction J0 in submodel sub1'. I could imagine incorporating another symbol like '@' for features of elements, too:


S2:@boundarySpecies
k2:@constant