TranslatorSRI / Babel

Babel creates cliques of equivalent identifiers across many biomedical vocabularies.
MIT License
9 stars 2 forks source link

Issues with NameResolver identifying Simoctocog Alfa #354

Open marcello-deluca opened 1 month ago

marcello-deluca commented 1 month ago

The following drugs appear to be identified as "simoctocog alfa" when running name resolver (this list contains some duplicates):

DAMOCTOCOG ALFA PEGOL DARATUMUMAB; VORHYALURONIDASE ALFA EFANESOCTOCOG ALFA SUSOCTOCOG ALFA EFRALOCTOCOG ALFA LONOCTOCOG ALFA CINAXADAMTASE ALFA VORHYALURONIDASE ALFA DAMOCTOCOG ALFA PEGOL DARATUMUMAB; VORHYALURONIDASE ALFA SUSOCTOCOG ALFA EFRALOCTOCOG ALFA CINAXADAMTASE ALFA VORHYALURONIDASE ALFA DAMOCTOCOG ALFA PEGOL DARATUMUMAB; VORHYALURONIDASE ALFA SUSOCTOCOG ALFA EFRALOCTOCOG ALFA CINAXADAMTASE ALFA VORHYALURONIDASE ALFA DAMOCTOCOG ALFA PEGOL DARATUMUMAB; VORHYALURONIDASE ALFA EFANESOCTOCOG ALFA SUSOCTOCOG ALFA EFRALOCTOCOG ALFA CINAXADAMTASE ALFA VORHYALURONIDASE ALFA DAMOCTOCOG ALFA PEGOL DARATUMUMAB; VORHYALURONIDASE ALFA SUSOCTOCOG ALFA EFRALOCTOCOG ALFA CINAXADAMTASE ALFA VORHYALURONIDASE ALFA DAMOCTOCOG ALFA PEGOL DARATUMUMAB; VORHYALURONIDASE ALFA SUSOCTOCOG ALFA EFRALOCTOCOG ALFA CINAXADAMTASE ALFA VORHYALURONIDASE ALFA DAMOCTOCOG ALFA PEGOL DARATUMUMAB; VORHYALURONIDASE ALFA EFANESOCTOCOG ALFA SUSOCTOCOG ALFA EFRALOCTOCOG ALFA CINAXADAMTASE ALFA

It would be worth looking into what is causing this. There are additional nuances in biologics that are not captured by Name Resolver; as I find patterns, I will create separate issues for these.

gaurav commented 1 month ago

The conflation that produces this entry is:

["UNII:VQ723R7O8R", "UNII:6892UQT2GK", "UNII:113E3Z3CJJ", "UNII:969NZA3X9T", "UNII:U50VWW6XH6", "CHEMBL.COMPOUND:CHEMBL2108455", "DRUGBANK:DB09329", "DRUGBANK:DB16662", "DRUGBANK:DB13192", "DRUGBANK:DB14700", "RXCUI:4257", "RXCUI:70057", "RXCUI:70058", "RXCUI:217502", "RXCUI:221060", "RXCUI:227719", "RXCUI:253151", "RXCUI:669348", "RXCUI:792675", "RXCUI:826070", "RXCUI:1050327", "RXCUI:1050330", "RXCUI:1158490", "RXCUI:1167725", "RXCUI:1167751", "RXCUI:1171319", "RXCUI:1172847", "RXCUI:1179552", "RXCUI:1186631", "RXCUI:1593092", "RXCUI:1593095", "RXCUI:1593154", "RXCUI:1593156", "RXCUI:1607560", "RXCUI:1607563", "RXCUI:1661331", "RXCUI:1718960", "RXCUI:1718961", "RXCUI:1718962", "RXCUI:1718963", "RXCUI:1718964", "RXCUI:1718965", "RXCUI:1718966", "RXCUI:1718967", "RXCUI:1718968", "RXCUI:1718969", "RXCUI:1718970", "RXCUI:1718971", "RXCUI:1718972", "RXCUI:1718973", "RXCUI:1718974", "RXCUI:1718992", "RXCUI:1718993", "RXCUI:1718994", "RXCUI:1718995", "RXCUI:1718996", "RXCUI:1719221", "RXCUI:1719222", "RXCUI:1719223", "RXCUI:1719224", "RXCUI:1719225", "RXCUI:1719226", "RXCUI:1719227", "RXCUI:1719229", "RXCUI:1719230", "RXCUI:1719231", "RXCUI:1719241", "RXCUI:1719242", "RXCUI:1719243", "RXCUI:1719245", "RXCUI:1719246", "RXCUI:1719328", "RXCUI:1719330", "RXCUI:1719331", "RXCUI:1720165", "RXCUI:1720166", "RXCUI:1729085", "RXCUI:1729086", "RXCUI:1729087", "RXCUI:1729088", "RXCUI:1729089", "RXCUI:1729090", "RXCUI:1729091", "RXCUI:1737558", "RXCUI:1737559", "RXCUI:1737560", "RXCUI:1741392", "RXCUI:1741394", "RXCUI:1741395", "RXCUI:1741398", "RXCUI:1741407", "RXCUI:1741408", "RXCUI:1741409", "RXCUI:1743370", "RXCUI:1743371", "RXCUI:1743372", "RXCUI:1743373", "RXCUI:1743374", "RXCUI:1796378", "RXCUI:1796379", "RXCUI:1796380", "RXCUI:1796381", "RXCUI:1796382", "RXCUI:1796383", "RXCUI:1796384", "RXCUI:2055654", "RXCUI:2055656", "RXCUI:2055657", "RXCUI:2055658", "RXCUI:2055659", "RXCUI:2055660", "RXCUI:2055661", "RXCUI:2055662", "RXCUI:2055663", "RXCUI:2055664", "RXCUI:2275723", "RXCUI:2275724", "RXCUI:2275725", "RXCUI:2275726", "RXCUI:2275727", "RXCUI:2275728", "RXCUI:2275729", "RXCUI:2631085", "RXCUI:2631086", "RXCUI:2631087", "RXCUI:2631088", "RXCUI:2631089", "RXCUI:2631090", "RXCUI:2631091", "RXCUI:2631092", "RXCUI:2631093", "RXCUI:2645501", "RXCUI:2646067", "RXCUI:2646563", "RXCUI:2646609", "RXCUI:2647744", "RXCUI:2647954", "RXCUI:2648151", "RXCUI:2648152", "RXCUI:2648266", "RXCUI:2648545", "RXCUI:2648973", "RXCUI:2649473", "RXCUI:2654324", "RXCUI:2654369", "RXCUI:2654382", "RXCUI:2654413", "RXCUI:2654724", "RXCUI:2655251", "RXCUI:2655433", "RXCUI:2655661", "RXCUI:2655714", "RXCUI:2656425", "RXCUI:2656429", "RXCUI:2656921", "RXCUI:2657209", "RXCUI:2657346", "RXCUI:2657604", "RXCUI:2657831", "RXCUI:2658113", "RXCUI:2658860", "RXCUI:2660182", "RXCUI:2660367", "RXCUI:2660395", "RXCUI:2661423", "RXCUI:2661566", "RXCUI:2661861", "RXCUI:2661889", "RXCUI:2662170", "RXCUI:2662364", "RXCUI:2662378", "RXCUI:2662550", "RXCUI:2663072", "RXCUI:2663567", "UMLS:C5782713", "UMLS:C5234188"]

The non-RXCUI identifiers are:

It looks like we're combining all of these identifiers because they all include RXCUI:4257 "factor VIII". I discussed this with @marcello-deluca and it looks like in the short term he'll need a NameRes instance that does not include DrugChemical conflation. He'll ask me to build this next week and then I should be able to build it in 2-3 days.

cbizon commented 1 month ago

This is very interesting.

When I first looked at this clique I thought "oh no something's gone wrong". Looking at it again, I'm not so sure. All of the (long) list of merged cliques are cases in which the 'active ingredient' is Factor VIII. So in one sense the conflation is doing what we expect - it's merging cliques for drugs (formulations) that have the same active ingredient.

Now, in this case (and probably others like this) it's a little hairy because these are biologics, so the F8 actually varies a bit from drug to drug. Unlike small molecules, the definition of "same active ingredient" is quite a bit looser as far as RXNORM is concerned. Either they're a recombinant form or are PEGylated or some other kind of light modification. So what do we think that the right thing to do is here? Is keeping this set of drugs separate a useful distinction? Or is lumping them together more helpful?

Tagging @elliottsharp to get a practitioner's input.

marcello-deluca commented 1 month ago

cinaxadamtase alfa --> recombinant ADAMTS13. Vorhyaluronidase alfa --> human recombinant hyaluronidase

finding quite a few that are actually factor viii but also finding several that are other recombinant proteins; I think that recombinant proteins in general are being mishandled. Naming conventions are super weird and nonstandard though so this will be a challenge to correct.

cbizon commented 4 weeks ago

Thanks for pointing those out!

And just to be clear, we're not using the naming at all, this is all based on structured relationships from rxnorm.

cbizon commented 4 weeks ago

Hmm, I checked name resolver like this:

https://name-resolution-sri-dev.apps.renci.org/lookup?string=Lonoctocog%20alfa&autocomplete=true&highlighting=false&offset=0&limit=1

I think that first clique is the right one that contains all of the Factor 8 things. But it doesn't seem to contain cinaxadamtase or vorhyaluronidase. Maybe the results are different on a different version of name-res? Which are you using? Or maybe I'm not understanding how you'e gotten to that list of synonyms?

elliottsharp commented 3 weeks ago

Thanks for the flag @cbizon

To synthetize - these synonyms are chemically and structurally different, and therefore are different active ingredients, so should be treated separately because we would expect to see different biomedical effects

Yes, most are different flavours of recombinant factor VIII (sample screenshot below), but in this instance, this is most accurately viewed as a "drug class" (e.g., like antihypertensives)

We had a rule of thumb for small molecules which becomes harder to adhere to for biologics, if the chemical formula is different, it is a different drug (excluding salts of the drug, as this is a result of the formulation). Obviously, this is harder with biologics as they are typically long chain proteins, but I believe this rule should still stand until we can prove otherwise.

image

Side note:

marcello-deluca commented 3 weeks ago