ga4gh / ga4gh-schemas

Models and APIs for Genomic data. RETIRED 2018-01-24
http://ga4gh.org
Apache License 2.0
214 stars 114 forks source link

Mapping variables to ontologies / controlled vocabularies #513

Open AAMargolin opened 8 years ago

AAMargolin commented 8 years ago

We had an active discussion on yesterday's G2P call on an issue we determined should be a DWG level discussion.

In the G2P schema we are using the convention used in metadata and other task teams, where variables are flexibly defined as an array of OntologyTerms. Specifically, let us say that the evidence variable in a G2P association is represented as below (not quite how we do it, but close):

array evidence;

For a given type of query, we want to define the specific OntologyTerms that are returned in the evidence statement. (see here for related discussion https://github.com/ga4gh/g2p-team/issues/10)

For simplicity, let's say for a query on cancer clinical genomics datasets, following the convention developed by @obigriffith, one of the terms in the resulting evidence statement is a variable evidenceType, which can take one of 3 values {predictive, prognostic, or diagnostic}.

How do we want to represent this mapping of variables to such a controlled vocabulary? Do we leave this out of the schema, and put into the documentation the allowable values of a variable and perhaps develop checks on the server side? Is there a more automated or elegant way? When do we want to use ontologies versus more simple controlled vocabularies and how do we represent each?

@sarahhunt , @diekhans , @bwalsh , please weigh in according to our discussion on the call if I didn't explain this quite right or you have ideas of possible solutions.

Thanks Adam

helenp commented 8 years ago

Some best practice suggestions:

  1. define all cv used. Even if it's local to your implementation. I would define all terms in the schema if you are not able to cleanly reference something external
  2. check for ontologies that represent the CV. The closest I found was in NCI Thesaurus - e.g. http://bioportal.bioontology.org/ontologies/NCIT?p=classes&conceptid=http%3A%2F%2Fncicb.nci.nih.gov%2Fxml%2Fowl%2FEVS%2FThesaurus.owl%23C28282 - not great maps, a definition could help refine these and often improves the differentia between the terms
  3. If you need new terms consider adding these to an existing ontology e.g. the Evidence Code Ontology - though it's not clear from the issue text what these terms are evidence for