clingen-data-model / clingen-interpretation

Allele (variant) interpretation model and API for ClinGen
3 stars 1 forks source link

RegionAllelesOutcome #179

Closed cbizon closed 3 years ago

cbizon commented 6 years ago

For many of our ValueSets that could have either Yes or No outcomes (like null allele), we only made a value for the Yes version, and used the NOT qualifier to make the No outcome.

For RegionAllelesOutcome, however, we have a value set with both the Yes and No terms.

Was this on purpose, or an oversight? I think the latter, and I think we should change it to conform. I'm going to write the documentation as if we are making it only have the Yes outcome, so either we need to change the value set or we need to change the docs.

mbrush commented 6 years ago

Here is a case where ontology terms exist that map to the negated assertion - see row 220-221 of the spreadsheet. So it is up to us if we want to use these as values for the affirmative and the negation, or go with a single value in the value set that can be negated with a NOT qualifier.

Another consideration for this value set in particular is that RegionAlleles is an assertion about the positional relationship between an allele and a region. While there are currently only two possible values (bounded or not), there may be more if we want to support additional nuance/precision here. e.g. see other RO relations related to relative position of sequences.

Given this, it would be perfectly fine to me to create a value set here with two values, and possibly more in the future. if we want this structure to be as flexible and accommodating as possible. (This may be something to think about for other booleans we converted to a single-valued value set with possibility for negation).

Lets discuss on next call.

mbrush commented 6 years ago

Agreed on 4-26-18 CG-SEPIO call to propose a broader value set here, based on relations in the RO ontology.

Minimally we should have the following two values from the original value set:

  1. RO:0002526 'overlaps sequence of' (allele overlaps but is not necessarily completely subsumed by the region)
  2. RO:0002527 'does not overlap sequence of'

Additional values to consider:

  1. RO0002525 'is subsequence of' (allele is completely covered/subsumed by the region)
  2. RO:0002529 'is downstream of sequence of'
  3. RO:0002528 'is upstream of sequence of'
  4. RO:0002515 'sequentially adjacent to' (immediately next to, but not committing to if upstream or downstream)

Direction of the statement is relating the allele of interest to the region of interest (allele -> region)

There are many other properties in RO to consider if even more nuance is desired: http://www.ontobee.org/ontology/RO?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FRO_0002514

larrybabb commented 6 years ago

Next step is to add these 6 concepts to the RegionAllelesOutcome bound value set in the sheets and integrate with example records that would be impacted.

larrybabb commented 6 years ago

@mbrush has this been done officially yet? I can update the sheets if so.