Open cbizon opened 1 year ago
For what it's worth, it also seems like the Babel is combining cisplatin and transplatin, which are enantiomers of one another
In terms of how the chembl's are coming in: 11359 is being pulled in via a link from DrugCentral. 2068237 is not being pulled at all.
For most chembl mappings, we rely on UNICHEM, but there aren't inchi's for this b/c of the metal.
We have 2 different "Cisplatin" entries in Babel/NN.
A main entry: https://nodenormalization-sri.renci.org/1.3/get_normalized_nodes?curie=PUBCHEM.COMPOUND%3A5460033&conflate=true
And then this one is a single CHEMBL https://nodenormalization-sri.renci.org/1.3/get_normalized_nodes?curie=CHEMBL.COMPOUND%3ACHEMBL2068237&conflate=true
You may at first think that CHEMBL is just not integrating, but the PUBCHEM one above actually contains a CHEMBL ID.
The actual problem is that CHEMBL contains two identifiers for CISPLATIN:
https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL2068237/ https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL11359/
From the CHEMBL page it's not clear to me what the difference is. They link to Pubchem SIDs that make it look like a chiral difference, but all of those SIDs link to the same CID (which is not the pubchem CID above due to charge state), and none of these pubchem entries call themselves cisplatin.
Fundamentally it seems as though chembl is not very happy with metal containing compounds. From their faq:
Without any molfile or other structure, I'm not sure how we're supposed to link 2068237 to anything. I'll dig around to see if I can figure out how 11359 is getting merged.