SED-ML / KiSAO

Ontology of algorithms for analyzing biological models, their parameters, and their outputs
Artistic License 2.0
9 stars 0 forks source link

New terms for SED-ML use #90

Closed luciansmith closed 3 years ago

luciansmith commented 3 years ago

Unfortunately, I don't have enough information to fill out the 'new term' questionnaire for these terms, but would be happy to work with people to help fill out what needs to be filled out.

https://docs.google.com/spreadsheets/d/1f3rBUUhOsB6k7-i_ins0nlB7PTtd_huvhgFrCOh501U/edit#gid=0

Every term on the list I know is supportable by Tellurium, and I believe the majority if not all of them are also supported by Copasi.

Not sure if we need a KiSAO term for 'time'--it's a little weird if we start expanding the list of possibilities but don't include it; on the other hand, it's an odd fit for KiSAO, too.

jonrkarr commented 3 years ago

Its pretty quick to edit with protege. I could give you a quick orientation over Zoom.

I think these terms should be added under a new trunk of the ontology. In addition to the terms you added, we'll need to add a few "organizational" terms to group the terms. This could follow the color groupings in your table.

Regarding tracking support for terms by simulation tools, we could put this in BioSimulators. KiSAO only has a small amount of information about simulation tools. I think that information is better suited for a database than an ontology. This is largely implemented already; it needs to be modified for the SED-ML L1V4 target+symbol scheme.

luciansmith commented 3 years ago

Sure, I can try that. Should I just do what I can and create a pull request?

The only reason I mentioned simulation tools is that the auto-generated form had a spot asking about them, perhaps to ensure that terms weren't being added that had zero tool support.

jonrkarr commented 3 years ago

That plan sounds good.

Protege is fairly straightforward. These are the key attributes to use:

skos:altLabel, isOrganizational, rdfs:seeAlso, isImplementedIn are optional.

KiSAO does have a slot for implementation information. We could use this. I wouldn't encourage people to rely on that information because its very incomplete. KiSAO isn't really structured to handle information about what simulation tools implement. To begin to handle that better KiSAO would need to have a term for each simulation tool.

matthiaskoenig commented 3 years ago

If you give an introduction to updating KISAO with protege I would like to join the zoom call. Then I could help in making updates.

On Thu, Jun 3, 2021 at 11:31 PM Jonathan Karr @.***> wrote:

That plan sounds good.

Protege is fairly straightforward. These are the key attributes to use:

  • rdfs:label (language:en): primary name of each term
  • skos:altLabel (language:en): synonyms
  • skos:definition (language:en): description
  • isOrganizational (xsd:boolean): true if the term is an abstract concept (i.e., shouldn't be used in a SED-ML document; only its children should be used in SED-ML documents)
  • rdfs:seeAlso (xsd:anyURI): URL for more information (e.g., https://identifiers.org/doi/XYZ)
    • rdfs:comment (language:en) on this can provide a human-readable citation for the URL
  • isImplementedIn (xsd:anyURI): https://identifiers.org/biosimulators/tellurium
  • dcterms:creator: (language:en): LPS
  • dcterms:created: (xsd:date): 2021-06-03

skos:altLabel, isOrganizational, rdfs:seeAlso, isImplementedIn are optional.

KiSAO does have a slot for implementation information. We could use this. I wouldn't encourage people to rely on that information because its very incomplete. KiSAO isn't really structured to handle information about what simulation tools implement. To begin to handle that better KiSAO would need to have a term for each simulation tool.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SED-ML/KiSAO/issues/90#issuecomment-854193903, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG33OXVBOTXEY2JNWUP5S3TQ7YCDANCNFSM46BPCCLQ .

-- Matthias König, PhD. Junior Group Leader LiSyM - Systems Medicine of the Liver Humboldt Universität zu Berlin, Institute of Biology, Institute for Theoretical Biology https://livermetabolism.com @.*** https://twitter.com/konigmatt https://github.com/matthiaskoenig Tel: +49 30 2093 98435

jonrkarr commented 3 years ago

It looks like Lucian figured it out. I think the notes above are really all one needs to know about editing the ontology. I copied the notes above to the CONTRIBUTING.md.

jonrkarr commented 3 years ago

I added missing data types and started to increment the version number for this.

Since KiSAO is largely independent of SBML and SED-ML, I suggest we generalize some of the names of the organizational terms so the names are less tied to their use with SBML, SED-ML, and specific algorithms. For example, "modelling and simulation implied variable" is somewhat specific to SBML's use of "time". NeuroML/LEMS also uses the time symbol, but time is more explicit in that language. "rate of change" is also not necessarily a derived quantity. For FBA, this is the primary result.

I pushed my suggestions in another branch: https://github.com/SED-ML/KiSAO/tree/lps-karr-new-terms

luciansmith commented 3 years ago

The vast majority of these changes look great to me. The minor exceptions:

Also, you seem to have removed my 'isOrganizational:false' tags, which seems a bit odd to me--surely there aren't defaults for these sorts of things?

At any rate, changes added!

jonrkarr commented 3 years ago

I put in 'full' in the definition of the three 'full' versions of the matrices, just to be explicit.

To me an eigenvector matrix means full unless reduced is specifically stated. For that reason, I thought it makes sense to make ""full eigenvector matrix" a synonym of "eigenvector matrix". Same for the similar terms.

We still need 'amount' and 'particle number' explicitly, so I made the intensive/extensive quantities categories.

My thinking was to use variable characteristics similar to algorithm characteristics. They represent modifiers which be coupled to variables to indicate the specific dimensionality needed for a particular variable. This could avoid the need to create KiSAO terms for each combination of type of variable and possible dimensionality/units. I thought this could work with SBML species: target indicates the species; symbol indicates the dimensionality. However, SED-ML expects individual KiSAO terms rather than expressions involving possibly multiple terms. This means we have to have KiSAO terms for each concrete combination needed.

I suggest we nest particle number, amount, and concentration under "model and simulation variable" and have relationships to "extrinsic property" and "intrinsic property". Because there are multiple ways to categorize terms, we can use other types of relationships to express this rather than parent-child.

I'm not sure I like 'rate of change' in the 'model and simulation variable' category, since it's basically the mathematical concept 'derivative' which doesn't really seem like a variable to me (unlike 'time'). I moved it to the 'mathematical function' category, mulled over it a bit, then moved it back, because I'm not sure I like it over there, either. As you mentioned elsewhere, we can always change the organization of this, so there's no rush on this, but the sort of things I would see living with 'time' would be things like 'length' or 'ambient temperature', which are quantities in and of themselves, not a rule about how to obtain a quantity.

I see your point. How about we remain the category to "model and simulation property" so that it doesn't imply whether its a primary or derived quantity (so it can be used in different modeling frameworks which either is possible)?

Also, you seem to have removed my 'isOrganizational:false' tags, which seems a bit odd to me--surely there aren't defaults for these sorts of things?

I removed this because the convention is that terms are only organizational if isOrganizational is set to true. (i.e. no other term has isOrganizational = false.) Just trying to stick with the conventions that we've inherited.

luciansmith commented 3 years ago

re; 'full': your argument makes sense to me and I left the change; I just added '(full)' in the 'skos:definition' field, and left the rest as you had it. I know Tellurium is always explicit in its API about which version you're asking for, so I tend to think of e.g. the reduced stoichiometry matrix as still being a stoichiometry matrix, just not the 'full' one.

re: intrinsic/extrinsic: I'm happy with whatever organizational mode you want; the one you propose sounds good to me. It seems like SED-ML doesn't expressly need the concept of 'extensive' or 'intensive', so if you're not using them as anything organizational, we might not need them at all, but it's fine if they're there.

re: rateOfChange; The category rename sounds perfect.

re: organizational: sounds good! Continuing previous conventions seems fine.

Are you good making these changes, or should I?

jonrkarr commented 3 years ago

I already made these changes. Let me know if you want to make more changes, or I should release this version.

Regarding integration with tellurium, when I release the version, the python package with KiSAO embedded into it will also be pushed to PyPI (pip install kisao >= 2.16).

re; 'full': your argument makes sense to me and I left the change; I just added '(full)' in the 'skos:definition' field, and left the rest as you had it. I know Tellurium is always explicit in its API about which version you're asking for, so I tend to think of e.g. the reduced stoichiometry matrix as still being a stoichiometry matrix, just not the 'full' one.

Sounds good. I saw this.

re: intrinsic/extrinsic: I'm happy with whatever organizational mode you want; the one you propose sounds good to me. It seems like SED-ML doesn't expressly need the concept of 'extensive' or 'intensive', so if you're not using them as anything organizational, we might not need them at all, but it's fine if they're there.

Yes, in my opinion the "characteristic" terms shouldn't be embedded into SED-ML documents. Nevertheless, I think they're quite useful. This is what I've used to automate algorithm substitution. The characteristics work around the problem that there's no single hierarchy of terms that can capture all relationships among terms.

luciansmith commented 3 years ago

Works for me--releasing this version sounds great.

jonrkarr commented 3 years ago

I added merged in the terms I started in #85. In particular, this include amount rate, concentration rate, and particle number rate which some requested. This also includes terms for FBA and logical simulations.

jonrkarr commented 3 years ago

Closing. This is already released.