NCATSTranslator / minihackathons

MIT License
5 stars 5 forks source link

Clarify Semantics of CIViC data provided by SPOKE #296

Open mbrush opened 2 years ago

mbrush commented 2 years ago

Spin off of a tangential question that arose in #240 , namely:

What are the plain language semantics of the assertions represented in SPOKE Edges created from on CIViC records? And what SPO structure is used to capture is this? I gather from above that relate a subject gene to an object chemical/drug? Using what predicate?

Rather than copy all discussion from that ticket, please pick up the thread on this topic at this comment, and continue discussion below. The concluding thought from @brettasmi that ticket to work from here was:

It sounds like a re-evaluation (of the current SPOKE representation) is necessary and the time is right to get something more accurate into Biolink. I will defer to @karthiksoman, who is much better with the modeling of SPOKE than I am.

Ball is in your court @karthiksoman. We can discuss here, or on a modeling call (Monday Helpdesk, Tues EPC, or Fri predicates are all possible venues)

mbrush commented 2 years ago

On the SPOKE documentation, these CIViC based associations are framed as "Compound - affects - (mutant)Gene" Associations, and grouped alongside associations based on ClinicalTrials.gov and GDSC. When we address the question of CIViC statement semantics, we should also look at that of associations based on CT.org and GDSC data.

karthiksoman commented 2 years ago

Hi Matt. Thanks for the feedback. I discussed this with Sergio and he thinks that our choice of words for the predicate (AFFECTS) may not be accurate. We are open to your suggestions, hopefully something shorter than: “Gene X has variants associated with sensitivity/resistance to” and it will be easy to modify that (the mapping itself is correct). I think, the direction of the edge also will change (from Gene --> Compound instead of Compound --> Gene).