SBRG / bigg_models

The BiGG Models website server
http://bigg.ucsd.edu
Other
77 stars 18 forks source link

How is the Recon3D model related to the publication model? #289

Open matthiaskoenig opened 6 years ago

matthiaskoenig commented 6 years ago

The BiGG Recon3D model has

Metabolites | 5835 Reactions | 10600 Genes | 2248

whereas the paper reported:

3,288 open reading frames (representing 17% of functionally annotated human genes), 13,543 metabolic reactions involving 4,140 unique metabolites, and 12,890 protein structures.

What are the real numbers for RECON3D? How is the BiGG RECON3D different from the published RECON3D?

willigott commented 6 years ago

In addition to the points mentioned, there seems to be also an issue with the defined groups. The published model has 111 groups, while the BiGG version has only 103. Also the IDs are changed; while in the published one the IDs are groupx, in the BiGG version they are called gx. The bigger issue though is that the numbering is changed and group members seem to have created that did not exist before or they have disappeared. To give one example:

group1 in the published model looks like this:

['R_AGTim',
 'R_AGTix',
 'R_ARGSS',
 'R_ASNNm',
 'R_ASNS1',
 'R_ASPNATm',
 'R_ASPTAm',
 'R_DASPO1p',
 'R_NACASPAH',
 'R_RE1473C',
 'R_RE2031M',
 'R_RE2642C',
 'R_ALAR',
 'R_ASPTA',
 'R_r0127',
 'R_ARGSL']

If I then check g1 in the BiGG model it looks completely different, so I checked in which group the reaction 'R_ASPNATm' is in the BiGG model: I find it located in g47 which looks as follows:

['R_AGTim',
 'R_AGTix',
 'R_ARGSS',
 'R_ASNNm',
 'R_ASNS1',
 'R_ASPNATm',
 'R_ASPTAm',
 'R_DASPO1p',
 'R_NACASPAH',
 'R_ALAR',
 'R_ASPTA',
 'R_ASNN',
 'R_ARGSL']

So the reactions {'R_RE1473C', 'R_RE2031M', 'R_RE2642C', 'R_r0127'} are missing in the group, while there is an additional reaction {'R_ASNN'}. All those four missing reactions are not included in the BiGG model. Interestingly, the additional reaction in the group is not included in the published model. That points to a deeper issue: There are 5295 reaction IDs in the published model which are not in the BiGG version, and 2352 reaction IDs in the BiGG model not present in the published one, so, as @matthiaskoenig mentioned in a different issue, there seems to be a parsing issue.

Just see that R_ASNN's old identifier is r0127, so this mapping went well; question would then still be what happened to the remaining 3 reactions.

zakandrewking commented 6 years ago

Thanks for helping us look at this. We did not have a lot of lead time with the model to work these issues out, so your feedback is invaluable.

I have to spend more time looking at the model, but I see right off the bat that r0127 was matched to an existing bigg reaction ASNN, so that explains one change: http://bigg.ucsd.edu/models/Recon3D/reactions/ASNN

zakandrewking commented 6 years ago

We are basing our version on the file here, which i received from the authors of the paper:

https://github.com/SBRG/bigg_models_data/blob/master/models/Recon3D.mat

In that file, the group you are talking about looks like this:

In [12]: [x for x in m.reactions if x.subsystem == 'Alanine and aspartate metabolism']
Out[12]:
[<Reaction AGTim at 0x123ef7b38>,
 <Reaction AGTix at 0x123f022b0>,
 <Reaction ARGSS at 0x123f50cc0>,
 <Reaction ASNNm at 0x123f5bba8>,
 <Reaction ASNS1 at 0x123f5bcc0>,
 <Reaction ASPNATm at 0x123f68898>,
 <Reaction ASPTAm at 0x123f68c18>,
 <Reaction DASPO1p at 0x124050390>,
 <Reaction NACASPAH at 0x1243f83c8>,
 <Reaction ALAR at 0x125147e10>,
 <Reaction ASPTA at 0x125153278>,
 <Reaction r0127 at 0x126cada90>,
 <Reaction ARGSL at 0x126ce51d0>]

At least when I read it with COBRApy.

zakandrewking commented 6 years ago

sorry @matthiaskoenig that we're getting off track from the original question :) We'll get to it

willigott commented 6 years ago

@zakandrewking: Ok, I used the sbml from here, so maybe there is a discrepancy between it and the mat file; the sbml contains 13543 reactions which is the number stated in the original post.

zakandrewking commented 6 years ago

Right. They also make a distinction between the model (which we provide) and the larger knowledge base

zakandrewking commented 6 years ago

@matthiaskoenig I think this answers the question. Our version is the "model". You can find both the model and the larger knowledge base in the downloads section at http://vmh.life/

matthiaskoenig commented 6 years ago

So what is the definitive source of Recon3D? It seems like already a few weeks after the publication there are circulating 5 different versions of the model/knowledge base (supplement SBML file, vmh SBML file, BiGG SBML file, SBRG SBML file). This is the problem if you don't have one community repository which manages the latest version ! In addition the mat file properly is different to the SBML.

Can we just agree for now that the definitive version is the latest version hosted on vmh for now until a proper repository for RECON is hopefully established at some point. So can we get the the RECON3D-v3.01 in BiGG?

So do I understand correctly, that their are 2 "mat" file hosted on vmh corresponding to the knowledgebase and the model, and one SBML which is the "model"? http://vmh.uni.lu/#downloadview

It would be very important for me to have an SBML with the information of the knowledgebase and the model on BiGG with all the annotations (especially ENSG and UniProt, which are used in key parts of the analysis of the paper). Unfortunately, I don't have Matlab licenses (nor money for them), so I can't look at the mat files and these are completely useless for me. So I am stuck with an SBML containing only part of the knowledgebase and missing most information of RECON3D.

@zakandrewking yes, I can find both on model and knowledgebase on vmh. But both are in a commercial binary format not useable

ChristianLieven commented 6 years ago

Unfortunately, I don't have Matlab licenses (nor money for them), so I can't look at the mat files and these are completely useless for me.

@matthiaskoenig I haven't tried it myself yet but COBRApy apparently supports the import of mat files. Perhaps that helps you! https://cobrapy.readthedocs.io/en/stable/io.html#MATLAB

matthiaskoenig commented 6 years ago

Things start to make more sense already. I could open the mat files with octave and most of the information seems to be there (with exception of ENSG and UniProts which were only published in the RECON3D supplements). Just have to write some code to bring this in a normal text format like JSON/YAML.

I would suggest adding the 2 SBML files corresponding to "Recon3D-v3.01.mat" and "Recon3DModel-v3.01.mat" to BiGG with

Recon3DModel-v3.01.mat:

Recon3D-v3.01.mat:

Species, reactions and genes require annotations to the original "ids" used in the mat files, i.e. it must be obvious which BiGG identifiers map to the identifiers used in the mat files (this seem to be the old identifiers in BiGG, but these must be exported in the SBML).

zakandrewking commented 6 years ago

VMH will continue to be the definitive source. I can add a disclaimer to the Recon3D page on BiGG to that effect

Adding old identifiers to BiGG SBML also makes sense if they are not already there On Wed, Feb 28, 2018 at 1:19 AM Matthias König notifications@github.com wrote:

Things start to make more sense already. I could open the mat files with octave and most of the information seems to be there (with exception of ENSG and UniProts which were only published in the RECON3D supplements). Just have to write some code to bring this in a normal text format like JSON/YAML.

I would suggest adding the 2 SBML files corresponding to "Recon3D-v3.01.mat" and "Recon3DModel-v3.01.mat" to BiGG with

Recon3DModel-v3.01.mat:

  • reactions: 10600
  • species: 5835
  • genes: 2248

Recon3DModel-v3.01.mat:

  • reactions: 13543
  • species: 8399
  • genes: 3697

Species, reactions and genes require annotations to the original "ids" used in the mat files, i.e. it must be obvious which BiGG identifiers map to the identifiers used in the mat files (this seem to be the old identifiers in BiGG, but these must be exported in the SBML).

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/SBRG/bigg_models/issues/289#issuecomment-369172497, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMUYAsCdcKXC3UsNvoICCvgoGAVj4YNks5tZRoZgaJpZM4SVc_O .

draeger commented 6 years ago

@matthiaskoenig Mat-files can be parsed using dedicated libraries and are therefore accessible without MATLAB. ModelPolisher parses these mat files and converts them to SBML, optionally also annotating them using BiGG Models database.

matthiaskoenig commented 6 years ago

@draeger thanks for the help. Was not aware that there is such good support for mat files outside of matlab. But I still don't understand why all the gene information is not part of the mat/SBML files (i.e. the information provided in Supplement S4, Supplemental Data File of Recon3D). There are links to Ensembl, MIM, UniProt, GO, WikiGene Id which are all lacking from the mat files and the SBML. Basically the species and reactions are annotated well, whereas all the gene information is lacking completely.

zakandrewking commented 6 years ago

The SBML files provided by BiGG reflect the information in our database, so when we import models with extra annotations, it takes us some time to upgrade the database, APIs, and web pages to include this info. This just takes some time, and in the meantime I would recommend that users look to the other available Recon3D files at VMH.

On Wed, Feb 28, 2018 at 11:26 AM Matthias König notifications@github.com wrote:

@draeger https://github.com/draeger thanks for the help. Was not aware that there is such good support for mat files outside of matlab. But I still don't understand why all the gene information is not part of the mat/SBML files (i.e. the information provided in Supplement S4, Supplemental Data File of Recon3D). There are links to Ensembl, MIM, UniProt, GO, WikiGene Id which are all lacking from the mat files and the SBML. Basically the species and reactions are annotated well, whereas all the gene information is lacking completely.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/SBRG/bigg_models/issues/289#issuecomment-369353901, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMUYG2PP4GEqNpHaGmNDg62iId5IAZZks5tZahugaJpZM4SVc_O .

matthiaskoenig commented 6 years ago

thanks @zakandrewking I completely understand, would be great to have this information in BiGG in the long run. No urgency here. I will work with the mat files and supplements for now.

Only real issue I see right now is that the old identifiersare missing from the SBML (i.e. the gene identifiers used in the mat files). The old gene identifiers are basically the gene ids. If they are not in the model it is impossible to map the SBML genes on the supplements.

zakandrewking commented 6 years ago

great. we'll prioritize getting these old IDs in the SBML

On Wed, Feb 28, 2018 at 11:35 AM Matthias König notifications@github.com wrote:

thanks @zakandrewking https://github.com/zakandrewking I completely understand, would be great to have this information in BiGG in the long run. No urgency here. I will work with the mat files and supplements for now.

Only real issue I see right now is that the old identifiersare missing from the SBML (i.e. the gene identifiers used in the mat files). The old gene identifiers are basically the gene ids. If they are not in the model it is impossible to map the SBML genes on the supplements.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/SBRG/bigg_models/issues/289#issuecomment-369356801, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMUYIRnrnRX7i9BlkV7UMUoS81Q4RSBks5tZaqVgaJpZM4SVc_O .