ga4gh / va-spec

An information model for representing variant annotations.
Apache License 2.0
17 stars 4 forks source link

Competency Questions for Evidence/Provenance Modeling #27

Open mbrush opened 5 years ago

mbrush commented 5 years ago

One of the big modeling tasks that we have not discussed in detail is representing evidence and provenance information supporting a given variant annotation statement. We have superficially considered and documented a some requirements in this space as part of the VA type requirements effort here. But what Javi and I would like to do next is to collect a rich set of competency questions (CQs) for each VA type that will be used to inform its evidence and provenance model.

Here we would ask the 'owners' and 'supporters' of each VA type who led the initial requirements work to also help with these CQ efforts - using the expertise they have accrued for their VA types, and familiarity with the data and needs of its users. An updated list of VA types and owners/supporters for this task is here.

We anticipate this work to take just 1-2 hours each, and we will provide details/assistance for this task on upcoming calls. Please respond here or email the VA list if you have any questions, suggestions, or concerns. Thanks all!

mbrush commented 5 years ago

A bit more on CQs:

For some examples, see the CQ bank we assembled here for a project about modeling temporal aspects of cancer (CQs are in Section III), or the document here for a project about BRCA variant pathogenicity interpretation modeling.

Please keep in mind that for our task we are specifically after CQs related to the evidence and provenance (E/P) information behind an annotation/assertion - examples in the docs linked above may contain some examples of such CQs, but many are unrelated to E/P.

mbrush commented 5 years ago

It may help to think of three high-level categories of CQs:

  1. Discovery CQs are simple queries that directly return data from a dataset, and require no calculation or analysis. These aim to return annotations with specific features ("Find annotations that . . . "). Here we consider the perspective of a user searching for annotations of a given type, and what aspects of evidence or provenance they would want to search/facet/filter on.

  2. 'Descriptive' CQs are also simple, but aim to return specified features of a known annotation ("For this particular annotation, what is its . . ."). Here we consider the perspective of a user looking to use a particular instance of an annotation they have in hand, and what aspects of evidence or provenance would be helpful for them to understand, trust, and apply it appropriately.

  3. Analysis CQs present more complex research questions and use cases that require some calculation, statistical analysis, or other methods to be performed on the data to generate an answer. (e.g. "Researchers from what institution have provided the most publications cited as evidence for pathogenic BRCA2 interpretations over the past 10 years")

The CQ corpus here organizes its queries according to these types of CQs.

For our purposes, Discovery and Descriptive CQs are easier to produce and should be the focus of this effort, but you can also report any Analysis CQs you come up with as well.