Closed larrybabb closed 3 years ago
I notice that userLabel is still a property on DomainEntity, but as we discussed before, that has the potential to "pollute" the object associated with an IRI for a DomainEntity (which may be a very specific IRI from a ValueSet) with those UserLabels-- and it is questionable whether the "scope" or "lifetime" of a UserLabel is properly handled there.
When we discussed this on the teleconference, I gave a few proposals, which I called the "thesaurus" proposal (enclosing Statement has a list of UserLabels that should be applied to nodes referred by its properties) and the "neighboring node" proposal, where a userLabel would be added as a sibling in a list referred to by a property.
Both of these proposals would make the DomainEntity part of a range of an RDF tuple involving the UserLabel, rather than part of the domain of a tuple... In other words, the DomainEntity is a (reused) value in the relationship, rather than having UserLabel be a property of DomainEntity.
id: CG-EX:CaseCtrl030
type: CaseControl
oddsRatio: 9.1361
confidenceLevel: 99
confidenceIntervalLower: 6.1425
confidenceIntervalUpper: 13.5889
canonicalAllele: CAR:CA253316
condition: CG-EX:GenCond048
caseGroupFrequency:
id: CG-EX:AllFreq029
type: AlleleFrequency
ascertainment:
id: CGEX-TYPE:0001
type: Ascertainment
userLabel:
id: CG-EX:UsrLabl002
type: UserLabel
description: Case group is composed of cases from six cohorts from the GENIUS
T2D consortium. Case status is defined per-cohort.
allele: CAR:CA253316
alleleCount: 32
alleleNumber: 4076
controlGroupFrequency:
id: CG-EX:AllFreq028
type: AlleleFrequency
ascertainment:
id: SEPIO:0000332
type: Ascertainment
label: ExAc ascertainment method
userLabel:
id: CG-EX:UsrLabl008
type: UserLabel
label: ExAC ascertainment method
description: Using ExAC as control group since they are unlikely to have this condition
allele: CAR:CA253316
alleleCount: 105
alleleNumber: 121336
description: Treating ExAC data as controls
Pros:
In other words, if the IRI is used elsewhere, the userLabel would be considered attached there as well in the RDF graph. Sure, someone just reading the json would see that the UserLabel attached to SEPIO:0000332 is just for this part of the message, but if this were part of a larger message, where SEPIO:0000332 was used again, when the json-ld framing APIs or conversion to RDF were used, this UserLabel would now be a property of SEPIO:0000332 everywhere!
id: CG-EX:CaseCtrl030
type: CaseControl
oddsRatio: 9.1361
confidenceLevel: 99
confidenceIntervalLower: 6.1425
confidenceIntervalUpper: 13.5889
canonicalAllele: CAR:CA253316
condition: CG-EX:GenCond048
caseGroupFrequency:
id: CG-EX:AllFreq029
type: AlleleFrequency
ascertainment:
id: _:b0
type: Ascertainment
allele: CAR:CA253316
alleleCount: 32
alleleNumber: 4076
controlGroupFrequency:
id: CG-EX:AllFreq028
type: AlleleFrequency
ascertainment:
id: SEPIO:0000332
type: Ascertainment
label: ExAc ascertainment method
allele: CAR:CA253316
alleleCount: 105
alleleNumber: 121336
description: Treating ExAC data as controls
userLabels:
- id: _:b1
labelFor: _:b0
type: UserLabel
description: Case group is composed of cases from six cohorts from the GENIUS
T2D consortium. Case status is defined per-cohort.
- id: _:b2
labelFor: SEPIO:0000332
type: UserLabel
description: Using ExAC as control group since they are unlikely to have this condition
Pros:
id: CG-EX:CaseCtrl030
type: CaseControl
oddsRatio: 9.1361
confidenceLevel: 99
confidenceIntervalLower: 6.1425
confidenceIntervalUpper: 13.5889
canonicalAllele: CAR:CA253316
condition: CG-EX:GenCond048
caseGroupFrequency:
id: CG-EX:AllFreq029
type: AlleleFrequency
ascertainment:
- id: _:b0
type: Ascertainment
- id: _:b1
labelFor: _:b0
type: UserLabel
description: Case group is composed of cases from six cohorts from the GENIUS
T2D consortium. Case status is defined per-cohort.
allele: CAR:CA253316
alleleCount: 32
alleleNumber: 4076
controlGroupFrequency:
id: CG-EX:AllFreq028
type: AlleleFrequency
ascertainment:
- id: SEPIO:0000332
type: Ascertainment
label: ExAc ascertainment method
- id: _:b2
labelFor: SEPIO:0000332
type: UserLabel
description: Using ExAC as control group since they are unlikely to have this condition
allele: CAR:CA253316
alleleCount: 105
alleleNumber: 121336
description: Treating ExAC data as controls
Pros:
Cons:
My preference is for the "thesaurus" proposal, by the way...
Thanks for writing these up Bradford. I will have more to say later, but for now just a couple clarifications.
Did you and Chris meet about the proposal to eliminate the DomainEntity types for each value set, and typing value terms in the data (e.g. SEPIO:0000332) using a single high-level type (e.g. something like 'Descriptor', 'Value', 'Coded Value', 'Semantic Value', . . . ).
Asking because the examples above continue to use the more specific DomainEntity types for typing value set values (e.g. 'Ascertainment'), and wondered if this reflected a decision on y'all's part to keep them for now?
Can you clarify the uses for userLabels . . . . and the things you want to avoid declaring about a re-used IRI in these UserLabel objects. My initial understanding was that these are solely meant to define a user-preferred label for an existing value term IRI, but from the examples above there may be other info hanging from them - e.g. 'descriptions' that explain/clarify the use of a value in the context of a particular data record.
I ask because some of the UserLabel objects in the examples above are don’t provide a user-preferred label, and I had assumed this was their primary/only function (e.g. :b1 and :b2 at the end of the thesaurus example).
Chris and I did meet and are approaching agreement on how to handle those DomainEntites. Re-evaluating UserLabels was part of that discussion-- related since if we go that route (replacing the formerly-codeableconcepts with simple iris), then we would need to figure out how user labels would fit in such a model.
The other reason the examples above use the older style is because they are based on examples currently in the documentation.
I'm actually not sure how the overall group feels about having the ability to have UserLabels include a clarifying description in addition to a simpler/shorter label-- I drew these examples from things already in the sheets documents, and there were some there. Perhaps @larrybabb or @cbizon can comment.
There is some significant overlapping discussion in #185 (and/or things mentioned there that should be here).
@mbrush suggested that instead of bringing up the UserLabel to the enclosing object to control scope, we could instead add a "scope" property to UserLabel that defines the scope, and that we could just use blank nodes when creating a domain entity on the fly. So, for instance:
id: CG-EX:CaseCtrl030
type: CaseControl
oddsRatio: 9.1361
confidenceLevel: 99
confidenceIntervalLower: 6.1425
confidenceIntervalUpper: 13.5889
canonicalAllele: CAR:CA253316
condition: CG-EX:GenCond048
caseGroupFrequency:
id: CG-EX:AllFreq029
type: AlleleFrequency
ascertainment:
id: _:b0
type: Ascertainment
label: GENIUS T2D cases
description: Case group is composed of cases from six cohorts from the GENIUS
T2D consortium. Case status is defined per-cohort.
allele: CAR:CA253316
alleleCount: 32
alleleNumber: 4076
controlGroupFrequency:
id: CG-EX:AllFreq028
type: AlleleFrequency
ascertainment:
id: SEPIO:0000332
type: Ascertainment
label: ExAc ascertainment method
userLabel:
type: UserLabel
label: ExAC ascertainment method
scope: CG-EX:AllFreq028
allele: CAR:CA253316
alleleCount: 105
alleleNumber: 121336
description: Treating ExAC data as controls
Pros:
Cons:
There are also other comments in #185 regarding further specifications of UserLabel (UserLabel.label should be 1..1, better description of what the UserLabel.description should mean, clarifying that UserLabel can optionally have contribution to include provenance).
Per recent discussions, we decided to go with a thesaurus model. I think this means we would need to have a userLabel attribute on Statement which would provide the scope for the label.
It seems like the UserLabel examples that are in the examples currently are of the "create a blank node to describe an element that extends a valueset" variety-- in which case they should just be implemented as such nodes according to our current thinking.
We should also have examples of userLabels that modify other (existing) entities).
@mbrush requested some examples of users wanting to provide their on labels for DomainEntities or more precisely, value Set concepts.
We should come up with a few scenarios with our domain experts to make sure these are reasonable and realistic and add it to our documentation as well.