More clearly handle multiple predictions per model element

jonrkarr commented 3 years ago

The variables of data generators have targets which are designed to be XML XPaths that identify a specific attribute of a model that should be recorded. This works well for SBML parameters which have a value attribute as part of their XML definition.

However, this is less clear for other predictions and breaks down when algorithms generate multiple predictions per model element. Examples

Species counts/concentrations. The SED-ML examples suggest using the target pattern /sbml:sbml/sbml:model/sbml:listOfSpecies/sbml:species[@id='{species-id}']. However, neither count nor concentration is an attribute of the XML definition of a species. I think this would be more clearly communicated as /sbml:sbml/sbml:model/sbml:listOfReactions/sbml:reaction/@count.
Flux variability analysis generates two predictions per reaction: the minimum and maximum flux. I don't think this should be encoded into another dimension as time is for time course simulations. Instead, I think it should be possible to address each prediction using a target like this /sbml:sbml/sbml:model/sbml:listOfReactions/sbml:reaction/@minFlux.

This suggestion is a blend of the concepts of symbols (which are not intended to be defined in definitions of models) and targets (which are intended to concretely reference specific elements of models).

For consistency and portability, I think the key ingredient is that the community needs to be able to use the same implicit targets across tools. Algorithms which make similar predictions should also use the same targets. With BioSimulators, we've created a place for the developers to advertise the targets that each of their algorithms recognizes and for investigators to browse this information.

matthiaskoenig commented 3 years ago

The target is the object in the model. The symbol would be used to define the type of information on the target which should be set or read. E.g. concentration, amount, particle number, ... or if no symbol is set the default information for an object (in case of SBML either amount or concentration for a species depending on how it was defined).

I agree this has to be more clear in the specification and examples with amount and cocentration should be added.

jonrkarr commented 3 years ago

The new symbols structured as a section of KiSAO addresses my concern. With just one symbol term for time and no example of combinations of targets and symbols, I felt this was ill-defined.

That said, the new symbol terms will add a lot of implementational complexity to SED-ML. I think we need to have a frank discussion about the degree to which tools will support this. Otherwise, this could amplify the divergence in the software support for SED-ML and make it yet harder to exchange SED-ML documents between tools.

matthiaskoenig commented 3 years ago

I completely agree on this. It would be great if tools could implement a standard JSON response which would clearly indicate if things were not supported. This would allow in combination with test cases for minimal functionality (e.g. for symbols) to test what is supported by a simulator. I put this on the agenda for our meeting.

jonrkarr commented 3 years ago

A standard response isn't enough. This information needs to be documented clearly and centrally. Users shouldn't have to discover this information though trial and error.

On Mon, Mar 29, 2021, 6:54 AM Matthias König @.***> wrote:

I completely agree on this. It would be great if tools could implement a standard JSON response which would clearly indicate if things were not supported. This would allow in combination with test cases for minimal functionality (e.g. for symbols) to test what is supported by a simulator. I put this on the agenda for our meeting.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SED-ML/sed-ml/issues/72#issuecomment-809283240, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVXMKOX7QOLJE7YLNFL6BTTGBS5TANCNFSM4VZ3EMBA .

matthiaskoenig commented 3 years ago

If you have a standardized response by the tools it would be possible to create a central repository of this information via a granular test suite for SED-ML features. I.e. the test suite could be applied to the tools and based on the responses the central information could be created.

jonrkarr commented 3 years ago

This is roughly what we've done with BioSimulators. However, the metadata is described in a file that sits alongside a simulation tool (typically in the root directory of its source code repository). Integrating standard responses into tools would require a lot more work than this design. This way even tools that aren't actively maintained can be curated with relatively little effort.

jonrkarr commented 3 years ago

We very much welcome anyone to join BioSimulators meetings and help shape the direction. Lucian started to join us a few weeks ago.

jonrkarr commented 3 years ago

This will be handled by the variables which have combinations of targets (XPATH) and symbols (KiSAO ids e.g., for flux, min flux, max flux, Jacobian, etc.)

SED-ML / sed-ml

More clearly handle multiple predictions per model element #72