RobokopU24 / NewSourceProposals

New Knowledge Providers (KPs) for the Data Management Oversight Group (DMOG) to review
0 stars 0 forks source link

FooDB #22

Open PhillipsOwen opened 3 years ago

PhillipsOwen commented 3 years ago

The latest FooDB database has IUPAC, INCHIKEY, INCHI and SMILES equivalent identifiers for chemical compounds. there are no equivalent identifiers for nutrients.

sadly, previous versions of the data had better equivalent identifiers.

the INCHIKEY is currently used by the parser to normalize the nodes and has a 0% success rate. look into using another equivalent identifier in the FooDB data that is more successful.

an example failure is: INCHIKEY:1S/C15H24/c1-11-7-8-13-12(2)6-5-9-15(3,4)14(13)10-11/h6,10,13-14H,5,7-9H2,1-4H3

cbizon commented 3 years ago

It will be easier to track down with more information: For a few failing chemicals, what are the inchikeys (as above), but also, what is the chemical name? It would be especially good if some of these were simple things like water or fructose or glutamate, which I expect should appear in the data.

cbizon commented 3 years ago

Oh, I think the problem is that the identifier above is not an inchikey. It's an inchi. An inchikey is something more like: AXVVJQZAKNNTPN-UHFFFAOYSA-N

cbizon commented 1 year ago

I think foodb needs to be looked at. It's using ChemicalSubstance and incorrect identifiers. We might need to work on babel first to get the id's for foods to work better.