Open eladnoor opened 5 years ago
I would suggest MetaNetX identifiers when they exist and otherwise, if there is ambiguity about the structure, I suggest to use the ChEBI ontology, for example, https://www.ebi.ac.uk/chebi/chebiOntology.do?chebiId=16389.
That sounds good to me. I think the first thing to do is convert all the KEGG IDs in the training set to MetaNetX identifiers, and use that database to get the structures when they exist. I'm not sure if we will need ChEBI or not, but I agree it could be another possible source.
This is a suggested priority list of chemical databases: InChI > MetaNetX > ChEBI > ChEMBL > PubChem > KEGG > BiGG
The new main key for matching compounds is the InChIKey. This works very well across databases, but only when the compound has a structure. Unfortunately, we also have thermodynamic information for some compounds without an InChI (e.g. ubiquinone, ferredoxin, etc.). We now use their KEGG ID as the unique identifier, but then cross-referencing between databases is more tricky.