eladnoor / component-contribution

Standard reaction Gibbs energy estimation for biochemical reactions
Other
17 stars 15 forks source link

Compound identifiers without InChI #26

Open eladnoor opened 5 years ago

eladnoor commented 5 years ago

The new main key for matching compounds is the InChIKey. This works very well across databases, but only when the compound has a structure. Unfortunately, we also have thermodynamic information for some compounds without an InChI (e.g. ubiquinone, ferredoxin, etc.). We now use their KEGG ID as the unique identifier, but then cross-referencing between databases is more tricky.

Midnighter commented 5 years ago

I would suggest MetaNetX identifiers when they exist and otherwise, if there is ambiguity about the structure, I suggest to use the ChEBI ontology, for example, https://www.ebi.ac.uk/chebi/chebiOntology.do?chebiId=16389.

eladnoor commented 5 years ago

That sounds good to me. I think the first thing to do is convert all the KEGG IDs in the training set to MetaNetX identifiers, and use that database to get the structures when they exist. I'm not sure if we will need ChEBI or not, but I agree it could be another possible source.

eladnoor commented 5 years ago

This is a suggested priority list of chemical databases: InChI > MetaNetX > ChEBI > ChEMBL > PubChem > KEGG > BiGG