Closed haowang-bioinfo closed 3 years ago
For additional context, below the current metaData
field in Human-GEM:
- metaData:
short_name: "Human-GEM"
full_name: "Generic genome-scale metabolic model of Homo sapiens"
version: "1.4.0"
date: "2020-06-12"
authors: "Jonathan Robinson, Hao Wang, Pierre-Etienne Cholley, Pinar Kocabas"
email: "jonrob@chalmers.se"
organization: "Chalmers University of Technology"
taxonomy: "9606"
github: "https://github.com/SysBioChalmers/Human-GEM"
description: "Genome-scale metabolic models are valuable tools to study metabolism and provide a scaffold for the integrative analysis of omics data. This is the latest version of Human-GEM, which is a genome-scale metabolic model of a generic human cell. The objective of Human-GEM is to serve as a community model for enabling integrative and mechanistic studies of human metabolism."
The new fields + modifications sound good to me. Additionally, it would be ideal if the field names in the yaml file match with the RAVEN spec names, for clarity. Below the cases that don't match based on what is already in RAVEN + the new names @Hao-Chalmers proposed:
Field | Name in RAVEN | Name in HumanGEM.yml |
---|---|---|
Model id | id |
short_name |
Model name | description |
full_name |
Authors | annotation.authorList |
authors |
URL where the model lives | annotation.sourceUrl |
github |
Additional comments | annotation.note |
description |
IMO the RAVEN names for id and URL would be preferable, as the former is the main choice in the COBRA community (Matlab and Python), and the latter is more generic, as not all RAVEN models are stored in Github. Could those 2 fields change in HumanGEM.yml
to id
and source_url
? @JonathanRob @mihai-sysbio
On the other side, the .yml
standard seems more adequate for model name, authors and comments (actually it's super confusing that the RAVEN field description
is the model name and the field note
contains a description). Would it make sense to change those 3 fields in RAVEN to fullName
, annotation.authors
and annotation.description
?
Are their corresponding (or comparable) COBRA fields for fullName
, annotation.authors
and annotation.description
?
Here are the latest yml fields are listed on COBRApy's devel
branch. Imho, it doesn't look like a direct mapping of the RAVEN fields.
Cobratoolbox has some rules for modelVersion
, modelName
and modelID
.
The short-name
is something meant to be as human-friendly as possible. For example, this field is what is shown in the navigation bar on Metabolic Atlas. I found this opencobra
thread illustrative of the implications of the BiGG model id spec. Also, I would like to point out the distinct fields for short-name
and version
. To me, it is of little importance what the keyword for the value of short-name
is. However, I am an advocate for its role: human-friendliness. Therefore, I would lean towards keeping this field closer to the standard-GEM naming rather than the BiGG id
spec. Needless to say, in the case of versioned models, it is expected of this short-name
to be the same as the repository name.
I support changing github
to something else. A potential drawback of the source_url
is that, as a new person, I could find it confusing if it meant to be the link to the repository, or directly to the file itself on a model hosting platform. But maybe that's just me - and I can't come up with a better suggestion than source_url
.
@BenjaSanchez the Expected changes
of this issue had been updated as you recommended.
@edkerk according to the latest model spec in COBRA, the following four fields could be associated between RAVEN and COBRA.
Field | Name in RAVEN | Name in COBRA |
---|---|---|
Model id | id |
modelID |
Model name | name |
modelName |
Model version | version |
modelVersion |
Additional comments | annotation.note |
description |
@Hao-Chalmers would the Expected changes
also include something about the shortName
field?
@mihai-sysbio I don't think an additional shortName
field is needed, since it is equivalent to the exiting id
field. Or are you suggesting renaming field from id
to shortName
?
I see. To me, an ID does not have to be human friendly, unlike shortName
. I think it would be clearer if some examples would be provided, maybe even both "good" and "bad". For example, a "bad" id
would be h_sap13417__1_3_0
, standing for Homo Sapiens model with 13417 reactions and corresponding to version 1.3.0
.
@mihai-sysbio good point in providing examples, which can be both added to the spec in Wiki once a consensus is reached.
So should HumanGEM's writeHumanYaml
be integrated in RAVEN's writeYaml
, thereby capturing this metadata?
So should HumanGEM's writeHumanYaml be integrated in RAVEN's writeYaml, thereby capturing this metadata?
@edkerk full support
It is not sufficient to just define fields in the RAVEN model structure, and support export to YML file format. SBML is still the de facto standard for model distribution, so these fields should also be properly stored there.
Related to this there are some unresolved issues:
version
, where is this stored in the SBML file? As far as I can find, this is not covered by the SBML specification. I see two options:
yeastGEM_v8_4_2
. Beneficial is that this is also loaded when using cobrapy or COBRA toolbox. However, would we then split the model id from SBML into two parts: (1) model.id and (2) model.version? In that case the model would have different model ids in RAVEN contrasting to cobrapy, COBRA etc. To avoid problems, I would prefer not to run regexprep
on any identifier.<rdf>
? Does standard-GEM
have a role to play in this?
<?xml version="1.0" encoding="UTF-8"?>
<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" xmlns:fbc="http://www.sbml.org/sbml/level3/version1/fbc/version2" xmlns:groups="http://www.sbml.org/sbml/level3/version1/groups/version1" level="3" version="1" fbc:required="false" groups:required="false">
<model metaid="iYali" id="iYali" name="iYali" fbc:strict="true">
<annotation>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
<rdf:Description rdf:about="#iYali">
<dcterms:creator>
<rdf:Bag>
<rdf:li rdf:parseType="Resource">
<vCard:N rdf:parseType="Resource">
<vCard:Family>Kerkhoven</vCard:Family>
<vCard:Given>Eduard</vCard:Given>
</vCard:N>
<vCard:EMAIL>eduardk@chalmers.se</vCard:EMAIL>
<vCard:ORG rdf:parseType="Resource">
<vCard:Orgname>Chalmers University of Technology</vCard:Orgname>
</vCard:ORG>
</rdf:li>
</rdf:Bag>
</dcterms:creator>
<dcterms:created rdf:parseType="Resource">
<dcterms:W3CDTF>2021-04-05T10:27:05Z</dcterms:W3CDTF>
</dcterms:created>
<dcterms:modified rdf:parseType="Resource">
<dcterms:W3CDTF>2021-04-05T10:27:05Z</dcterms:W3CDTF>
</dcterms:modified>
<bqbiol:is>
<rdf:Bag>
<rdf:li rdf:resource="https://identifiers.org/taxonomy/4952"/>
</rdf:Bag>
</bqbiol:is>
</rdf:Description>
</rdf:RDF>
</annotation>
id
is used instead of shortName
[and I would argue we should, as id
is similar to modelID
, model.id
and <model id="">
as used in COBRA, cobrapy and SBML], then why use fullName
and not just name
? The latter is also more in line with other software and the SBML specification.humanGEM.yml
, date
is also specified, should this be part of the RAVEN model structure? And what does this date reflect, when a new version was released? RAVEN generated SBML already includes the date that the file was created, but that's probably not what is meant here. Instead, the date should be set when the new version number is set, and absent if no version number is present?sourceUrl
be stored in the SBML? Also in annotation
, as the second suggestion for version
?description
is not problematic to store in the SBML, it is actually stored under <notes>
. With that in mind, why change note
to description
? cobrapy has model.notes
, and it is closer to the SBML specification.@edkerk good arguments indeed.
@mihai-sysbio what do you think, if standard-GEM
can help in adopting some fields?
On second thought, perhaps it is better to move the discussion about incorporation in SBML into a separate issue, as the current issue is just about the MATLAB structure and the yaml file. The points that remain relevant are:
model.name
field instead of model.fullName
.model.annotation.note
field instead of model.annotation.description
.@Hao-Chalmers it would make a lot of sense to standardize (and validate) that the yml
file has these fields. However, as @edkerk pointed out, maintaining compatibility with existing formats is tricky (1.ii), especially the newly added fields are to be parsed by other tools as well.
To me, the easiest path forward is what @edkerk suggested above:
current issue is just about the MATLAB structure and the yaml file
I would like to emphasize the different use cases for model.short_name
and model.full_name
. Here is how Metabolic Atlas uses these fields:
"short_name": "Yeast-GEM",
"full_name": "Consensus genome-scale metabolic model of Saccharomyces cerevisiae",
"description": "Consensus genome-scale metabolic model of Saccharomyces cerevisiae. It is the continuation of the legacy project yeastnet",
"version": "8.4.2",
Luckily, this GEM has a nice model.id
, but it's just a coincidence that it is readable. The model.id
could well have been yeastGEM_v8_4_2
. Since it is an identifier, it will not be parsed into anything readable or worth presenting on a website.
@edkerk @mihai-sysbio I adjusted the Expected changes
of this issue according to your input.
writeYaml
(5418e8814d0406259a7c2d526af1debcc6600a37) and the model fields definition (Wiki) are changed according to the discussion here, with the following exception:
givenName
and familyName
remain as (non-mandatory) fields, while authors
is an additional (non-mandatory) field. This is to ensure backwards compatibility, as givenName
and familyName
are actually coded in the SBML, and authors
is not, while their meaning is not identical (givenName
and familyName
would match organization
and email
, while for authors
this is ambigious).model.annotation
(defaultLB
, defaultUB
) are included as metaData in the yaml file.writeYaml
no longer sorts the identifiers (it used to do this, while writeHumanYaml
doesn't, probably best to keep the identifier order by default).Renaming model.description
to model.name
additionally required small refactoring of 23 files (fe7d417d64a4d1734e0863901a0a0e39439bdb15). As this breaks backwards compatibility with models that would already have been loaded in MATLAB, I suggest these changes result in release 2.5.0 instead of 2.4.4.
Description of the issue:
metadata
section to the yaml file specification in RAVENmetadata
section was introduced to the tailoredyaml
file in Human-GEM serving for the requirements of MetabolicAtlas, as detailed in issue #71. After continuous development and evolvement, this section functions pretty well in providing relevant information for GEM-type repo (e.g. Human-GEM), GEM archive MetabolicAtals, as well as the research community.Expected changes:
version
description
tofullName
~annotation
fieldsourceUrl
givenName
andfamilyName
intoauthors
note
todescription
~writeYaml
function to enable the exporting ofmetadata
information from fieldsid
, ~fullName
~name
,version
andannotation
I hereby confirm that I have: