RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

Tocilizumab seems mis-named #2409

Open jaredroach opened 1 week ago

jaredroach commented 1 week ago

I expect tocilizumab to bind the IL6R I have a hard time finding tocilizumab in the autocomplete (for reasons that shoudl become apparent) So I search for EVERYTHING that edges IL6R https://arax.ncats.io/?r=312740

As I expect for a high ranking for tocilizumab, I get Result 2 is Name: 2-[(4-Ethynyl-2-Fluorophenyl)amino]-3,4-Difluoro-N-(2-Hydroxyethoxy)benzamide Id: PUBCHEM.COMPOUND:10150081 Categories: biolink:ChemicalEntity,biolink:SmallMolecule Something very strange about the name and the classification of a monoclonal Ab as a small molecule, but....

attribute_type_id: biolink:description value_type_id: metatype:String value: A recombinant, humanized IgG1 monoclonal antibody directed against the interleukin-6 receptor (IL-6R) with immunosuppressant activity. Tocilizumab targets and binds to both the soluble form of IL-6R (sIL-6R) and the membrane-bound form (mIL-6R), thereby blocking the binding of IL-6 to its receptor. This prevents IL-6-mediated signaling. IL-6, a pro-inflammatory cytokine that plays an important role in the regulation of the immune response, is overproduced in autoimmune disorders, certain types of cancers and possibly various other inflammatory conditions.; A recombinant, humanized IgG1 monoclonal antibody directed against the interleukin-6 receptor (IL-6R) with immunosuppressant activity. Tocilizumab targets and binds to both the soluble form of IL-6R (sIL-6R) and the membrane-bound form (mIL-6R), thereby blocking the binding of IL-6 to its receptor. This prevents IL-6-mediated signaling. Il-6, a pro-inflammatory cytokine that plays an important role in the regulation of the immune response, is overproduced in autoimmune disorders and certain types of cancers. Check for "https://www.cancer.gov/about-cancer/treatment/clinical-trials/intervention/C84217" active clinical trials using this agent. ("http://ncit.nci.nih.gov/ncitbrowser/ConceptReport.jsp?dictionary=NCI%20Thesaurus&code=C84217" NCI Thesaurus); UMLS Semantic Type: STY:T116; UMLS Semantic Type: STY:T121;

...this does indeed appear to be tocilizumab based on the value.

Why is the name and PubMed link/description connected to tocilizumab?

amykglen commented 1 week ago

yeah, these super long chemical names have been bothering me too.. tracing the issue:

so the SRI NodeNormalizer says that "2-[(4-Ethynyl-2-Fluorophenyl)amino]-3,4-Difluoro-N-(2-Hydroxyethoxy)benzamide" is the name for PUBCHEM.COMPOUND:10150081, and we're supposed to use the names the SRI NodeNormalizer reports.

the reason why the description refers to tocilizumab instead of the crazy long name is because our algorithm for selecting a node description (during the KG2c build) chooses the most human-readable among all equivalent curies - so it doesn't necessarily correspond to the 'preferred' curie (a bit weird, but we found that it produces a more useful description..)

but in any case, it seems like the long name situation has already been resolved in the SRI NodeNormalizer, as the preferred curie for the concept tocilizumab is now UNII:I031V2H011 (instead of PUBCHEM.COMPOUND:10150081), which has the name "Tocilizumab".

meaning, if we were to do a new synonymizer/KG2c build at this point, the concept Tocilizumab should have the preferred id UNII:I031V2H011 and name Tocilizumab.

amykglen commented 1 week ago

I went ahead and transferred this issue from the RTX-KG2 repo to the RTX repo, as it seems to be a problem with synonymization/KG2c, rather than KG2pre.

but based on my explanation above, I think it's already fixed in the SRI NodeNormalizer, so I'm going to label this as "confirm in next kg2c build"

jaredroach commented 1 week ago

The problem is not the super long name. The problem is the super long name has nothing to do with tocilizumab. It is not even an antibody.

amykglen commented 1 week ago

ah, I see. I think this is an issue to raise with the SRI NodeNormalizer then, as a couple curies in their tocilizumab cluster have that name, even in the latest version of NodeNorm: https://nodenorm.ci.transltr.io/get_normalized_nodes?curie=PUBCHEM.COMPOUND:10150081

"equivalent_identifiers": [
...
{
"identifier": "PUBCHEM.COMPOUND:10150081",
"label": "2-[(4-Ethynyl-2-Fluorophenyl)amino]-3,4-Difluoro-N-(2-Hydroxyethoxy)benzamide"
},
{
...
{
"identifier": "DRUGBANK:DB08208",
"label": "2-[(4-ETHYNYL-2-FLUOROPHENYL)AMINO]-3,4-DIFLUORO-N-(2-HYDROXYETHOXY)BENZAMIDE"
},
{
amykglen commented 1 week ago

I just wrote up an issue on this in NodeNorm's repo: https://github.com/TranslatorSRI/NodeNormalization/issues/297