SBRG / bigg_models

The BiGG Models website server
http://bigg.ucsd.edu
Other
75 stars 18 forks source link

Non-unique gene names #375

Closed Hrovatin closed 3 years ago

Hrovatin commented 3 years ago

When looking at Recon3D model I have noticed that the names of genes, reactions, and metabolites are non-unique. I see why this is the case for reactions and metabolites - they may be located in different compartments, etc. However, I do not see why this is so for genes. Genes with the same names share NCBI gene IDs (and annotation), so they seem to be the same genomic entity. But they have mapped different reactions. Can you please explain what are the different gene instances presenting in this case?

However, there are also non-gene instances where I do not know why they are different, e.g. in image For some of the above shown metabolites it is clear that they are located in different compartments and thus different, but I could not find a difference between the first two (except in MetNX id (not shown), but there they again seem to be the same). Is there a reference stating in which cases entities (genes, reactions, metabolites) are considered as "different" entities (e.g. having different id).

matthiaskoenig commented 3 years ago

Normally a single gene encodes for a protein or subunit of a protein. A protein can catalyze multiple reactions. E.g. some reactions have a broad spectrum of metabolites they can work on. E.g. typically hexokinases work with glucose but also other sugars such as fructose, mannose, but often with much higher Km values and reduced maximal velocity (or kcat). For stoichiometric models such as Recon3D genes can consequently occur in multiple reactions because the protein can catalyze multiple reactions.

The metabolite looks like a complex glycan. Often alternative connections are possible between the sugar moieties in glycans. Depending on the branching structure you can have different molecules, but if you break it down to the sugar moieties (as in the name) they have identical names. I could imagine this is the case for the example. Alternatively this could be a mapping error or error in Recon3D. The information comes from different groups and models and the metabolites could not have been mapped correctly on each other.

Best Matthias

Hrovatin commented 3 years ago

Thank you for the answer.

So different gene instances that specify the same gene may differ in associated kinetics? I thought that the kinetics were given by reactions instances, so same gene instance could be associated with multiple reaction instances, each reaction having different kinetics. See example below:


cobra_model=get_cobra_model(path)
# gene info of gene instance - gene is associated with multiple reactions
cobra_model.genes[1].__dict__
Out[9]: 
{'_id': '26_AT1',
 'name': 'AOC1',
 'notes': {},
 '_annotation': {'sbo': 'SBO:0000243',
  'ccds': ['CCDS43679.1', 'CCDS64797.1'],
  'ncbigene': '26',
  'ncbigi': ['73486661',
   '1034654825',
   '1034654831',
   '440918691',
   '1034654829',
   '1034654827'],
  'omim': '104610',
  'refseq_name': 'AOC1',
  'refseq_synonym': ['KAO', 'DAO', 'DAO1', 'ABP', 'ABP1']},
 '_model': <Model Recon3D at 0x7fddaa772e50>,
 '_reaction': {<Reaction 13DAMPPOX at 0x7fdda4acee50>,
  <Reaction 42A12BOOX at 0x7fdda4ad8b20>,
  <Reaction HISTASE at 0x7fdd889debe0>,
  <Reaction MAOX at 0x7fdd88c17400>,
  <Reaction MHISOR at 0x7fdda5335fd0>,
  <Reaction PEAMNO at 0x7fdd88f54fd0>,
  <Reaction PTRCOX1 at 0x7fdda54debe0>,
  <Reaction RE0688E at 0x7fdd897acbe0>,
  <Reaction RE0689E at 0x7fdda5d10b80>,
  <Reaction RE0690E at 0x7fdda5d10550>,
  <Reaction RE0827E at 0x7fdd897c2970>,
  <Reaction RE0828E at 0x7fdd897c2bb0>,
  <Reaction RE3367E at 0x7fdda5e3bf40>,
  <Reaction TRYPTAOX at 0x7fdda5648250>,
  <Reaction r0281 at 0x7fdd8933bcd0>},
 '_functional': True}

# reaction instance - it specifies the kinetics
cobra_model.reactions[0].__dict__
Out[13]: 
{'_id': '24_25DHVITD3tm',
 'name': '24,25-Dihydroxyvitamin D3 transport from mitochondria',
 'notes': {},
 '_annotation': {'sbo': 'SBO:0000185',
  'bigg.reaction': '24_25DHVITD3tm',
  'metanetx.reaction': 'MNXR94734'},
 '_gene_reaction_rule': '',
 'subsystem': '',
 '_genes': set(),
 '_metabolites': {<Metabolite 2425dhvitd3_m at 0x7fddaca3f220>: -1.0,
  <Metabolite 2425dhvitd3_c at 0x7fddaca3fd60>: 1.0},
 '_compartments': None,
 '_model': <Model Recon3D at 0x7fddaa772e50>,
 '_lower_bound': 0.0,
 '_upper_bound': 1000.0}
Hrovatin commented 3 years ago

E.g. there are genes that seem to differ only in lists of reactions (and id). I do not see the reason why the list of reactions associated with a gene should be split up between two gene instances, since the kinetics are specified in the reactions anyways.


for g in cobra_model.genes:
   ...:     if g.name =='ACOX1':
   ...:         print(g.__dict__)
   ...:         
{'_id': '51_AT2', 'name': 'ACOX1', 'notes': {}, '_annotation': {'sbo': 'SBO:0000243', 'ccds': ['CCDS11734.1', 'CCDS11735.1'], 'hprd': '02030', 'ncbigene': '51', 'omim': '609751', 'refseq_name': 'ACOX1', 'refseq_synonym': ['PALMCOX', 'ACOX', 'SCOX']}, '_model': <Model Recon3D at 0x7fddaa772e50>, '_reaction': {<Reaction FAOXC2252053x at 0x7fdda4fb6040>, <Reaction FAOXC2452256x at 0x7fdda4fa18e0>, <Reaction FAOXC260240x_1 at 0x7fdda4fc68e0>, <Reaction FAOXC246226x at 0x7fdda4fa1130>, <Reaction ACOAO7p at 0x7fdda4b4c970>, <Reaction FAOXC2242046x at 0x7fdd87653a30>, <Reaction 3OHSUBCOAx at 0x7fdd881d1b80>, <Reaction FAOXC226205x at 0x7fdd8766abe0>, <Reaction FAOXC2251836x at 0x7fdd8763dd00>, <Reaction FAOXC241181x at 0x7fdd8770fd00>, <Reaction FAOXC2051843x at 0x7fdda4fb6d90>, <Reaction FAOXC200180x at 0x7fdda4fa1d90>, <Reaction FAOXC2442246x at 0x7fdda4f8cdc0>, <Reaction FAOXC2452253x at 0x7fdda4fa1e20>, <Reaction 3OHSEBCOAx at 0x7fdda6977e20>, <Reaction FAOXC240200x_1 at 0x7fdd87693e50>, <Reaction FAOXC180x at 0x7fdda4f8ce80>, <Reaction FAOXC183806x at 0x7fdd87628eb0>, <Reaction FAOXC18480x at 0x7fdd87628f40>, <Reaction 3HADICOAx at 0x7fdda6977f70>, <Reaction FAOXC16080x at 0x7fdda4f76fa0>}, '_functional': True}
{'_id': '51_AT1', 'name': 'ACOX1', 'notes': {}, '_annotation': {'sbo': 'SBO:0000243', 'ccds': ['CCDS11734.1', 'CCDS11735.1'], 'hprd': '02030', 'ncbigene': '51', 'omim': '609751', 'refseq_name': 'ACOX1', 'refseq_synonym': ['PALMCOX', 'ACOX', 'SCOX']}, '_model': <Model Recon3D at 0x7fddaa772e50>, '_reaction': {<Reaction FAOXC2252053x at 0x7fdda4fb6040>, <Reaction HMR_3062 at 0x7fdda70423d0>, <Reaction RE3086X at 0x7fdd897ec400>, <Reaction RE1516X at 0x7fdda5da85b0>, <Reaction HMR_3326 at 0x7fdd8844e700>, <Reaction FAOXC10DCC8DCx at 0x7fdd879547c0>, <Reaction FAOXC81C61x at 0x7fdd87a66850>, <Reaction r1448 at 0x7fdda58848b0>, <Reaction HMR_3057 at 0x7fdda70548b0>, <Reaction FAOXC260240x_1 at 0x7fdda4fc68e0>, <Reaction r1446 at 0x7fdd894a6910>, <Reaction FAOXC141C121x at 0x7fdd8982e940>, <Reaction ACOAO7p at 0x7fdda4b4c970>, <Reaction ACOAO4p at 0x7fdda5884970>, <Reaction RE3624M at 0x7fdd87954970>, <Reaction HMR_3094 at 0x7fdd883b4a00>, <Reaction FAOXC8C6x at 0x7fdd87a7ca90>, <Reaction r1450 at 0x7fdda5884b20>, <Reaction FAOXC14DCC12DCx at 0x7fdd8982eb20>, <Reaction r1451 at 0x7fdd894a6b80>, <Reaction r1444 at 0x7fdd894a6bb0>, <Reaction FAOXC226205x at 0x7fdd8766abe0>, <Reaction FAOXC101C102m at 0x7fdd87954be0>, <Reaction FAOXC22C20x at 0x7fdd87a3ebe0>, <Reaction RE1519X at 0x7fdda5cf8c10>, <Reaction HMR_3070 at 0x7fdda7066c40>, <Reaction FAOXC140120x at 0x7fdd8982ec70>, <Reaction FAOXC16DCC14DCx at 0x7fdda5ee4d00>, <Reaction HMR_3102 at 0x7fdda707ad00>, <Reaction FAOXC142C122x at 0x7fdda6060d30>, <Reaction HMR_3350 at 0x7fdd88482d30>, <Reaction FAOXC24C22x at 0x7fdd87a3ed60>, <Reaction FAOXC201C181x at 0x7fdd87a14d60>, <Reaction FAOXC2051843x at 0x7fdda4fb6d90>, <Reaction r1449 at 0x7fdda5884dc0>, <Reaction FAOXC2442246x at 0x7fdda4f8cdc0>, <Reaction FAOXC101C102x at 0x7fdd87954dc0>, <Reaction FAOXC102C103x at 0x7fdd87954df0>, <Reaction r1447 at 0x7fdd894a6dc0>, <Reaction FAOXC164C165x at 0x7fdd8982edc0>, <Reaction RE1517M at 0x7fdda5da8e50>, <Reaction HMR_3330 at 0x7fdd88482e50>, <Reaction PROFVSCOAhc at 0x7fdd88a30e50>, <Reaction FAOXC180x at 0x7fdda4f8ce80>, <Reaction RE2985M at 0x7fdd877b8e80>, <Reaction FAOXC183806x at 0x7fdd87628eb0>, <Reaction RE2994X at 0x7fdd877cceb0>, <Reaction RE2997X at 0x7fdd87914ee0>, <Reaction r0310 at 0x7fdda5704f10>, <Reaction FAOXC120100x at 0x7fdda5ee4f10>, <Reaction HMR_3066 at 0x7fdd883c6f10>, <Reaction FAOXC221C201x at 0x7fdda6010f40>, <Reaction FAOXC18480x at 0x7fdd87628f40>, <Reaction ACOAO8p at 0x7fdd883b4f10>, <Reaction HMR_3342 at 0x7fdda70d8f70>, <Reaction ACOAO5p at 0x7fdda5884f70>, <Reaction HMR_3098 at 0x7fdd883daf10>, <Reaction HMR_1637 at 0x7fdda6e14fa0>, <Reaction FAOXC12DCC10DCx at 0x7fdda6010fa0>, <Reaction FAOXC16080x at 0x7fdda4f76fa0>, <Reaction HMR_3334 at 0x7fdd88482f40>, <Reaction FAOXC246226x at 0x7fdda4fa1130>, <Reaction FAOXC204C205x at 0x7fdda5efd490>, <Reaction FAOXC8DCC6DCx at 0x7fdda60395e0>, <Reaction FAOXC181C161x at 0x7fdd879e7610>, <Reaction RE1516M at 0x7fdd897d8f10>, <Reaction HMR_3321 at 0x7fdda70b37f0>, <Reaction FAOXC2452256x at 0x7fdda4fa18e0>, <Reaction RE1517X at 0x7fdd87725940>, <Reaction FAOXC2242046x at 0x7fdd87653a30>, <Reaction FAOXC6C4x at 0x7fdda604da60>, <Reaction RE3247M at 0x7fdda5e0fb20>, <Reaction FAOXC143C123x at 0x7fdda6087b20>, <Reaction RE1518X at 0x7fdda5d25b80>, <Reaction 3OHSUBCOAx at 0x7fdd881d1b80>, <Reaction FAOXC6DCC4DCx at 0x7fdda604dc70>, <Reaction RE3626M at 0x7fdda5e51cd0>, <Reaction RE3247X at 0x7fdda5e0fd00>, <Reaction FAOXC163C164x at 0x7fdda6205d00>, <Reaction FAOXC2251836x at 0x7fdd8763dd00>, <Reaction FAOXC241181x at 0x7fdd8770fd00>, <Reaction FAOXC184C164x at 0x7fdd879fff70>, <Reaction RE2985X at 0x7fdda5dbbfd0>, <Reaction FAOXC241C221x at 0x7fdd87a2bd60>, <Reaction FAOXC161C141x at 0x7fdda6087d60>, <Reaction 3HPVSTETCOAhcx at 0x7fdd887ffd00>, <Reaction FAOXC200180x at 0x7fdda4fa1d90>, <Reaction FAOXC160140x at 0x7fdd879e7d90>, <Reaction RE3157X at 0x7fdd89805df0>, <Reaction FAOXC2452253x at 0x7fdda4fa1e20>, <Reaction 3OHSEBCOAx at 0x7fdda6977e20>, <Reaction HMR_3346 at 0x7fdda70b3e20>, <Reaction FAOXC240200x_1 at 0x7fdd87693e50>, <Reaction RE1518M at 0x7fdd87725e50>, <Reaction FAOXC10080x at 0x7fdda6023e80>, <Reaction RE2998M at 0x7fdda5dbbeb0>, <Reaction DEOXFVShc at 0x7fdda73d1eb0>, <Reaction 3HADICOAx at 0x7fdda6977f70>, <Reaction HMR_3338 at 0x7fdda70c5fa0>, <Reaction FAOXC225C226x at 0x7fdda6023fd0>}, '_functional': True}
matthiaskoenig commented 3 years ago

I just gave the example of the kinetics to clarify things (such information does not occur in Recon3D). For constraint-based models you are just interested in the boolean relationship between genes and reactions. This information is encoded in so called Gene-Protein-Relationships (or GPR). The information is important to figure out which reactions are affected if you knockout certain genes. Because a gene can be involved in multiple reactions a single gene knockout can effect multiple target reactions in a network. In addition you have many isoforms, i.e. proteins encoded by different genes but encode very similar proteins (with often different kinetics). E.g. you have a large set of GLUT glucose transport proteins such as GLUT1, GLUT2, GLUT4, ... .All of these transport glucose in the cell, i.e. have the same reaction equation, but are encoded by different genes. In addition you often need multiple subunits for a protein to work, so transcripts and proteins from all genes are required. These dependencies are encoded in the GPRs. This is an abstraction for computational modeling, specifically constraint-based modelling.

Hrovatin commented 3 years ago

So the gene instances may in fact differ in information missing from the model? If this is so we can close this issue.

matthiaskoenig commented 3 years ago

If with model you mean the cobra or cobrapy model then yes. At least on the level of annotations and gene sequences the genes have to be different. But basically two genes could encode the same reaction and would look and behave the same for instance from a gene knockout simulation. Please feel free to close this issue.