NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

NN failures: JAK genes #821

Open cbizon opened 4 months ago

cbizon commented 4 months ago

Query: What genes are downed by Ruxolitinib?

PK: ab417e9e-cac8-45c8-a114-4f4d4514c038

We get results for JAK1, JAK2, and JAK3, but also at least "Janus Kinase 1" and "Janus Kinase 2".

It looks like the UMLS identifiers (e.g. UMLS:C0169661) are not merging with the other gene/protein ids.

gaurav commented 3 months ago

I've confirmed that UMLS:C0169661 "Janus kinase 2" is being included by the protein module, not the leftover UMLS module. It really should be combined with UniProtKB:A8K910 "A8K910_HUMAN Tyrosine-protein kinase (trembl)" and UniProtKB:O60674 "JAK2_HUMAN Tyrosine-protein kinase JAK2 (sprot)" (which is not currently the case: https://nodenorm.ci.transltr.io/1.5/get_normalized_nodes?curie=NCBIGene%3A3717&curie=UMLS%3AC0169661&conflate=true&drug_chemical_conflate=false&description=false). I think this is another case of proteins and chemicals not being properly combined (https://github.com/TranslatorSRI/Babel/issues/310), so diagnosing and fixing that issue should fix this as well. Assigning to Guppy.

gaurav commented 3 months ago

FWIW we do combine UMLS:C1527617 "JAK2 protein, human" with the other protein identifiers, but not UMLS:C0169661 "Janus kinase 2", which we have both as its own clique as a protein as well as in a ChemicalEntity clique with MESH:D053614 "Janus Kinase 2". So even once we close https://github.com/TranslatorSRI/Babel/issues/310, there will be a bit left over as to why MESH:D053614 isn't being merged with the other JAX2 proteins.

gaurav commented 2 months ago

Pushing the chemical/protein issue to Hammerhead.