TranslatorSRI / NameResolution

A service for finding CURIEs from lexical strings.
3 stars 2 forks source link

Docetaxel Chembl term getting mapped to Docetaxel Trihydrate Pubchem #99

Open DnlRKorn opened 11 months ago

DnlRKorn commented 11 months ago

The CHEMBL curie CHEMBL.COMPOUND:CHEMBL3545252 gets mapped to PUBCHEM.COMPOUND:148123 (which is Docetaxel Trihydrate).

If you put Docetaxel into name_resolver, you get back PUBCHEM.COMPOUND:148124, which is the term for Docetaxel, and seems like a better place to resolve to. Similarly, if you run NodeNormalizer on DrugCentral:939, it also resolves to PUBCHEM.COMPOUND:148124.

I think an easy solution for this is to have PUBCHEM.COMPOUND:148123 (Docetaxel Trihydrate) resolve to PUBCHEM.COMPOUND:148124 (Docetaxel).

gaurav commented 11 months ago

This is also the state on NodeNorm Dev. The good news is that if you turn on drug conflation, these two cliques are merged. The bad news is that we currently use a simple rule to choose which PUBCHEM.COMPOUND to use to represent a drug conflated clique, which is to choose the one with the smallest CURIE suffix. Since 148123 < 148124, Docetaxel Trihydrate is used to represent this entire clique.

So, I think there are two big questions here:

DnlRKorn commented 11 months ago

Could we use the fact that PUBCHEM.COMPOUND:148123 lists PUBCHEM.COMPOUND:148124 as it's parent compound? (That is Docetaxel Trihydrate states that it's parent is Docetaxel)