NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Wording around MVP2 and the inferred reasoning #748

Closed sandrine-muller-research closed 4 months ago

sandrine-muller-research commented 7 months ago

Do we mean that the compound is affecting the regulation of the gene or simply that it can have an indirect effect on the expression of the gene? What do we mean exactly by regulation? As a tester/user, I was surprised to get certain answers for the question "What genes may be upregulated by compound X?" as regulation is a specific term and I can see lots of "affects_activity_or_abundance_of" which is different from "regulates".

sandrine-muller-research commented 7 months ago

This is a good example: image from the query "what genes may be upregulated by potassium ion?" The evidence shows: image

In this example, K+ is not upregulating ATP4A: ATP4A is catalyzing the reaction of ATP to ADP releasing a K+. So while, ATP4A affects the abundance of K+, ATP4A is not upregulated by K+.

sandrine-muller commented 7 months ago

This is also a question for O&O and the testing in general: (1) when someone asks what gene is upregulated by chemical X, it should mean that they are looking for genes which expression is increased due to chemical X. (2) in reality, the tool returns the response to what gene is affected by chemical X, which is different and it is not clear the user understands that. (3) the user will expect to have genes upregulated by chemical X on top at least (so in this example before the channel subunits or enzymes that catalyze the reaction). (4) currently the test assets in the test asset sheet that are curated by me are putting in top answers top for the question what gene is affected by chemical X which is usually wrong for the questions what gene is upregulated by chemical X if we really stick with the meaning of regulation.

sierra-moxon commented 7 months ago

Per our brief discussion in slack, assigning to Andy for next steps, e.g. do we need to align verbiage on MVP2 and MVP3 in the UI to reflect the kinds of edges returned.

Genomewide commented 7 months ago

@sandrine-muller I get caught by this phrase, too. I got called out by chtGPT even for saying that an agonist increased expression. That said, I think there are other issues with your example than the wording of the query.

For the query, I agree that 'regulation' is too specific. I think that 'increasing the activity' of a gene has some jargon definitions that some may say suggest a direct interaction on the protein activity, but I also think it is general enough that it is the best description of what is happening.

However, your example of potassium is not a UI issue. What you described as the problem is bigger and could possibly be taken care of in the scoring. But really that example does not fit any query we offer. However, I don't think it could be determined based on the predicates. So, to me, the process of assigning the predicates should be reviewed as well.

sandrine-muller commented 7 months ago

@Genomewide if we had "increase the activity of" instead of regulates, IMO ATP4A could be a ok result (and then dealt through a proper scoring) because the protein will have more activity than if K+ would not be around (here activity is not expression but literally activity). If we keep the "regulates" I would be keen of considering this result as wrong then.

sandrine-muller-research commented 6 months ago

Per @cbizon's comment on slack : Activity or abundance is the query and that is what the QA pairs should be testing

gprice1129 commented 4 months ago

This is finished. Closing.