NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Broader Category of Input Node for MVP2 #787

Closed sstemann closed 3 months ago

sstemann commented 4 months ago

It looks like Unsecret Agent is returning genes in MVP2 that are up/downregulated by broader categories fof the input small molecule node.

In Test > MVP2 > Sirolimus: https://ui.test.transltr.io/main/results?l=Sirolimus&i=CHEBI:9168&t=3&r=0&q=04e6e29b-0a68-440e-af6e-29acde1a498c Test PK 04e6e29b-0a68-440e-af6e-29acde1a498c

Expand the first result "PIK3CD", "MTOR PROTEIN, HUMAN", "TP53" etc and note that the results are for:

image

image

Note - I'm not seeing these specific results in Prod, nor via hunting/pecking have i found this replacement for any of the Unsecret results of the same query in prod. I also can't find the Unsecret results from Prod in the Unsecret Results from test. I dont know if this change is intentional?

https://ui.transltr.io/main/results?l=Sirolimus&i=PUBCHEM.COMPOUND:5284616&t=3&r=0&q=5879f774-9016-4a53-abef-e5eb9aff04e0 Prod PK: 5879f774-9016-4a53-abef-e5eb9aff04e0

cbizon commented 3 months ago

@webyrd @kaiwenho @sstemann any updates on this?

kaiwenho commented 3 months ago

@cbizon Yes, this issue is resolved in Unsecret CI and Test. The approach involves blocking subclass edges from "infores:medrt-umls" during subclass inference.

Unsecret did not plan to use the broader concepts as input nodes. I found the problem from the UMLS subclass edges which provided by RTX-KG2. The wrong edge for this issue is "mTOR Inhibitor Immunosuppressant" -"biolink:subclass_of"-> "sirolimus" which made the inferred chain

   "UMLS:C0016360" "fluorouracil"
   "biolink:subclass_of"
   "UMLS:C0021081" "Immunosuppressive Agents"
   "biolink:subclass_of"
   "UMLS:C2267048" "mTOR Inhibitor Immunosuppressant"
   "biolink:subclass_of"
   "UMLS:C0072980" "sirolimus"

The subclass relation between mTOR Inhibitor Immunosuppressant and sirolimus should be the other direction, and I did find the corresponding edge from RTX-KG2 as well. So both directions exists. If they are not equivalent, then one of the directions should be wrong. @webyrd and I agree that we should avoid using the UMLS edges for subclass inference now, as the queried node unmatched issue is also coming from bad UMLS edges. This should prevent incorrect edges from contaminating the subclass inference chain.

cbizon commented 3 months ago

Great, it sounds like this is fixed in octopus, at least at an output level. @saramsey is this something that should be fixed in KG2?

sstemann commented 3 months ago

resolved on Test