Open cbizon opened 4 years ago
Needs review
As reported in https://github.com/NCATSTranslator/Feedback/issues/373, GTOPDB:4215 is mapped to PUBCHEM.COMPOUND:178101032, but NodeNorm does not have this mapping -- is there a reason we shouldn't just include all GTOPDB-PUBCHEM.COMPOUND mappings from GTOPDB?
In my mind it's a question of single source of truth. If we're getting our mappings from unichem, then coming back through with a bunch of other mappings for those same things is one of the things that leads to confusion.
However, it looks like there are some of these mappings for things that are outside of unichem (like this one?) and those I think it's probably worth bringing in.
There are a few failure modes: 1) the chemical is something without a structure... should probably bring in all the identifiers a la the mesh update and chebi 2) There is a inchi, but it isn't in unichem for gtopdb, even if it is for e.g. pubchem (10532) - this one is real bad, because it means we can't 100% rely on unichem. Maybe not a lot of these? Hopefully? If we can pull a list of inchis with the chemicals, we should still be ok, glom should handle it 3) Peptides (4440, 6759) Not sure if we're rejecting this on purpose, but we shouldn't.