RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

Standup issue: gene/disease conflation #1830

Open dkoslicki opened 2 years ago

dkoslicki commented 2 years ago

I don't know if this is a node synonymizer issue, but check out the recent standup results: https://arax.ncats.io/?r=40666 The query itself uses NCBIGene:5354 i.e. PLP1, which the node synonymizer correctly says is a gene/protein, with all the categories: biolink:BiologicalEntity (2), biolink:ChemicalEntity (1), biolink:Gene (28), biolink:NamedThing (2), biolink:Protein (24) However, in the actual results, this is given the category biolink:Disease.

Tagging @amykglen and @saramsey in case this is a KG2 issue (in which case I can move it over there).

amykglen commented 2 years ago

so I think everything is actually functioning as intended here - in the results the category is set as biolink:Disease because that is the category assigned to that qnode in your query graph. I'm not a fan of that behavior (see #1360), but that's what the group decided was best a ways back.

saramsey commented 2 years ago

TY @amykglen for looking into this, and for linking #1360 which has a good discussion (my bad for not weighing in on the matter last August when it came up).

This may be settled "case law" architecturally speaking, but I'm hesitant about setting the Node category in the result to match the mapped QNode's category in the original query graph. Isn't that potentially throwing away useful information? FWIW, my initial reaction is to lean toward "provide all semantic type information that we have", even if that must be done in another node property or something.