clingen-data-model / clingen-interpretation

Allele (variant) interpretation model and API for ClinGen
3 stars 1 forks source link

Develop a strong set of UserLabel examples #184

Closed larrybabb closed 3 years ago

larrybabb commented 6 years ago

@mbrush requested some examples of users wanting to provide their on labels for DomainEntities or more precisely, value Set concepts.

We should come up with a few scenarios with our domain experts to make sure these are reasonable and realistic and add it to our documentation as well.

bpow commented 6 years ago

I notice that userLabel is still a property on DomainEntity, but as we discussed before, that has the potential to "pollute" the object associated with an IRI for a DomainEntity (which may be a very specific IRI from a ValueSet) with those UserLabels-- and it is questionable whether the "scope" or "lifetime" of a UserLabel is properly handled there.

When we discussed this on the teleconference, I gave a few proposals, which I called the "thesaurus" proposal (enclosing Statement has a list of UserLabels that should be applied to nodes referred by its properties) and the "neighboring node" proposal, where a userLabel would be added as a sibling in a list referred to by a property.

Both of these proposals would make the DomainEntity part of a range of an RDF tuple involving the UserLabel, rather than part of the domain of a tuple... In other words, the DomainEntity is a (reused) value in the relationship, rather than having UserLabel be a property of DomainEntity.

Concrete examples:

What we have now: (UserLabel directly on DomainEntity)

id: CG-EX:CaseCtrl030
type: CaseControl
oddsRatio: 9.1361
confidenceLevel: 99
confidenceIntervalLower: 6.1425
confidenceIntervalUpper: 13.5889
canonicalAllele: CAR:CA253316
condition: CG-EX:GenCond048
caseGroupFrequency:
  id: CG-EX:AllFreq029
  type: AlleleFrequency
  ascertainment:
    id: CGEX-TYPE:0001
    type: Ascertainment
    userLabel:
      id: CG-EX:UsrLabl002
      type: UserLabel
      description: Case group is composed of cases from six cohorts from the GENIUS
        T2D consortium.  Case status is defined per-cohort.
  allele: CAR:CA253316
  alleleCount: 32
  alleleNumber: 4076
controlGroupFrequency:
  id: CG-EX:AllFreq028
  type: AlleleFrequency
  ascertainment:
    id: SEPIO:0000332
    type: Ascertainment
    label: ExAc ascertainment method
    userLabel:
      id: CG-EX:UsrLabl008
      type: UserLabel
      label: ExAC ascertainment method
      description: Using ExAC as control group since they are unlikely to have this condition
  allele: CAR:CA253316
  alleleCount: 105
  alleleNumber: 121336
description: Treating ExAC data as controls

Pros:

In other words, if the IRI is used elsewhere, the userLabel would be considered attached there as well in the RDF graph. Sure, someone just reading the json would see that the UserLabel attached to SEPIO:0000332 is just for this part of the message, but if this were part of a larger message, where SEPIO:0000332 was used again, when the json-ld framing APIs or conversion to RDF were used, this UserLabel would now be a property of SEPIO:0000332 everywhere!

The "thesaurus" model:

id: CG-EX:CaseCtrl030
type: CaseControl
oddsRatio: 9.1361
confidenceLevel: 99
confidenceIntervalLower: 6.1425
confidenceIntervalUpper: 13.5889
canonicalAllele: CAR:CA253316
condition: CG-EX:GenCond048
caseGroupFrequency:
  id: CG-EX:AllFreq029
  type: AlleleFrequency
  ascertainment:
    id: _:b0
    type: Ascertainment
  allele: CAR:CA253316
  alleleCount: 32
  alleleNumber: 4076
controlGroupFrequency:
  id: CG-EX:AllFreq028
  type: AlleleFrequency
  ascertainment:
    id: SEPIO:0000332
    type: Ascertainment
    label: ExAc ascertainment method
  allele: CAR:CA253316
  alleleCount: 105
  alleleNumber: 121336
description: Treating ExAC data as controls
userLabels:
  - id: _:b1
    labelFor: _:b0
    type: UserLabel
    description: Case group is composed of cases from six cohorts from the GENIUS
      T2D consortium.  Case status is defined per-cohort.
  - id: _:b2
    labelFor: SEPIO:0000332
    type: UserLabel
    description: Using ExAC as control group since they are unlikely to have this condition

Pros:

The "neighboring node" model:

id: CG-EX:CaseCtrl030
type: CaseControl
oddsRatio: 9.1361
confidenceLevel: 99
confidenceIntervalLower: 6.1425
confidenceIntervalUpper: 13.5889
canonicalAllele: CAR:CA253316
condition: CG-EX:GenCond048
caseGroupFrequency:
  id: CG-EX:AllFreq029
  type: AlleleFrequency
  ascertainment:
  - id: _:b0
    type: Ascertainment
  - id: _:b1
    labelFor: _:b0
    type: UserLabel
    description: Case group is composed of cases from six cohorts from the GENIUS
      T2D consortium.  Case status is defined per-cohort.
  allele: CAR:CA253316
  alleleCount: 32
  alleleNumber: 4076
controlGroupFrequency:
  id: CG-EX:AllFreq028
  type: AlleleFrequency
  ascertainment:
  - id: SEPIO:0000332
    type: Ascertainment
    label: ExAc ascertainment method
  - id: _:b2
    labelFor: SEPIO:0000332
    type: UserLabel
    description: Using ExAC as control group since they are unlikely to have this condition
  allele: CAR:CA253316
  alleleCount: 105
  alleleNumber: 121336
description: Treating ExAC data as controls

Pros:

Cons:

bpow commented 6 years ago

My preference is for the "thesaurus" proposal, by the way...

mbrush commented 6 years ago

Thanks for writing these up Bradford. I will have more to say later, but for now just a couple clarifications.

  1. Did you and Chris meet about the proposal to eliminate the DomainEntity types for each value set, and typing value terms in the data (e.g. SEPIO:0000332) using a single high-level type (e.g. something like 'Descriptor', 'Value', 'Coded Value', 'Semantic Value', . . . ).
    Asking because the examples above continue to use the more specific DomainEntity types for typing value set values (e.g. 'Ascertainment'), and wondered if this reflected a decision on y'all's part to keep them for now?

  2. Can you clarify the uses for userLabels . . . . and the things you want to avoid declaring about a re-used IRI in these UserLabel objects. My initial understanding was that these are solely meant to define a user-preferred label for an existing value term IRI, but from the examples above there may be other info hanging from them - e.g. 'descriptions' that explain/clarify the use of a value in the context of a particular data record.
    I ask because some of the UserLabel objects in the examples above are don’t provide a user-preferred label, and I had assumed this was their primary/only function (e.g. :b1 and :b2 at the end of the thesaurus example).

bpow commented 6 years ago

Chris and I did meet and are approaching agreement on how to handle those DomainEntites. Re-evaluating UserLabels was part of that discussion-- related since if we go that route (replacing the formerly-codeableconcepts with simple iris), then we would need to figure out how user labels would fit in such a model.

The other reason the examples above use the older style is because they are based on examples currently in the documentation.

I'm actually not sure how the overall group feels about having the ability to have UserLabels include a clarifying description in addition to a simpler/shorter label-- I drew these examples from things already in the sheets documents, and there were some there. Perhaps @larrybabb or @cbizon can comment.

bpow commented 6 years ago

There is some significant overlapping discussion in #185 (and/or things mentioned there that should be here).

@mbrush suggested that instead of bringing up the UserLabel to the enclosing object to control scope, we could instead add a "scope" property to UserLabel that defines the scope, and that we could just use blank nodes when creating a domain entity on the fly. So, for instance:

id: CG-EX:CaseCtrl030
type: CaseControl
oddsRatio: 9.1361
confidenceLevel: 99
confidenceIntervalLower: 6.1425
confidenceIntervalUpper: 13.5889
canonicalAllele: CAR:CA253316
condition: CG-EX:GenCond048
caseGroupFrequency:
  id: CG-EX:AllFreq029
  type: AlleleFrequency
  ascertainment:
    id: _:b0
    type: Ascertainment
    label: GENIUS T2D cases
    description: Case group is composed of cases from six cohorts from the GENIUS
      T2D consortium.  Case status is defined per-cohort.
  allele: CAR:CA253316
  alleleCount: 32
  alleleNumber: 4076
controlGroupFrequency:
  id: CG-EX:AllFreq028
  type: AlleleFrequency
  ascertainment:
    id: SEPIO:0000332
    type: Ascertainment
    label: ExAc ascertainment method
    userLabel:
      type: UserLabel
      label: ExAC ascertainment method
      scope: CG-EX:AllFreq028
  allele: CAR:CA253316
  alleleCount: 105
  alleleNumber: 121336
description: Treating ExAC data as controls

Pros:

Cons:

There are also other comments in #185 regarding further specifications of UserLabel (UserLabel.label should be 1..1, better description of what the UserLabel.description should mean, clarifying that UserLabel can optionally have contribution to include provenance).

bpow commented 6 years ago

Per recent discussions, we decided to go with a thesaurus model. I think this means we would need to have a userLabel attribute on Statement which would provide the scope for the label.

bpow commented 6 years ago

It seems like the UserLabel examples that are in the examples currently are of the "create a blank node to describe an element that extends a valueset" variety-- in which case they should just be implemented as such nodes according to our current thinking.

We should also have examples of userLabels that modify other (existing) entities).