Many Test QA pairs appear to have the wrong output ids.

NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.

7 stars 0 forks source link

Many Test QA pairs appear to have the wrong output ids. #869

Open sharatisrani opened 3 months ago

sharatisrani commented 3 months ago

Here is a set of 6 just for the MVP2 (what drug will downreg) ACE query. In fact, all 11 pairs are mismatched between ARS results and QA pairs list.
Suspect this issue is widespread, so a machine based conversion of QA pairs via NodeNorm (as suggested by @maximusunc ) is the solution. Further @maximusunc deserves a special reward :-) if he fixes this, as he might already be inclined to do.

sharatisrani commented 3 months ago

From Jenn Hadlock:

Jennifer Hadlock (Multiomics Provider) 11:00 AM Let me know is semantic review is needed to confirm final names. This is part of our larger trade-off we've discussed about the pros/cons of choosing to work at the API level instead of conducting input through the UI (which would both use and test node normalizer). Here's a short summary. Going through the UI is philosophically better, but more expensive and fragile to implement, and more difficult to debug.

Let me know if you need review of CURIEs for drugs and disease. I'd just need a sheet that include the formal text string for that CURIE.

jh111 commented 3 months ago

QA pair inputs: When the user types in a string, they'll have to choose the option provided by node normalizer. In this specific case we can replace PUBCHEM.COMPOUNDS with CHEBI for input. options unless there's an input string that does not map to CHEBI. QA pair outputs: These should be replaced with whichever result gets shown to the user in the UI after node normalizer.

@maximusunc If you can provide a file with the original QA data and the normalized results (including the string) I can double check the entire set. I expect few, if any issues, because this effectively a review of node normalizer results.

maximusunc commented 3 months ago

PR open for the normalization of the input ids. Still working on the output ids. https://github.com/NCATSTranslator/Tests/pull/99

sierra-moxon commented 2 months ago

I believe the critical part of this ticket was handled in Guppy, and the remaining work will be done in Hammerhead. pls. correct if wrong of course.

sierra-moxon commented 2 months ago

from TAQA: what are we going to do with the Q/A pairs that do not have good output ids? headed toward:

put all the ids thru NN (done)
input ids fixed (done)
output ids were more tricky (lots of bad ways)
- NN issues, specified in the asset is a parent class (and Translator is returning subclasses)
relay session in the works.