ga4gh / gks-common

Common classes and schemas used by all GKS specifications (ie. VR, VA, etc..)
Apache License 2.0
2 stars 0 forks source link

Update to Entity class in common-source.yam #18

Closed mbrush closed 1 week ago

mbrush commented 3 weeks ago
mbrush commented 3 weeks ago

The added 'identifiers' property is based on the concepts of 'ids' vs 'identifiers' in FHIR - a generally useful pattern to enable capturing a locally unique logical id and globally unique identifiers for the Entity provided by external systems. Most community schema / standards have a way to represent this distinction.

mbrush commented 3 weeks ago

An Update and Tangential Thought: It is possible that the Mapping pattern that Larry and Alex defined in gks-common (and discuss here) could be adapted to accommodate the 'identifiers' use case in a different and more expressive way. If so, the identifiers property may not be needed here.

As defined, however, the Mapping pattern seems to be tailored for capturing concept-level identifier mappings (e.g. for domain entities like a Disease, Gene, or Drug). The 'identifiers' pattern proposed here is more about providing other identifiers for instances in a data set (e.g. a specific patient, document, statement/assertion, etc).

I think one solution is to keep the Mappings pattern specifically for concept level mappings of domain entity like things - per Larry's proposal in Discussion#14. And use the 'identifiers' pattern proposed here for more instance level mappings/xrefs. This is consistent with the distinction made by FHIR in how CodeableConcepts and Identifiers are used. And fits with Larry's proposal to push the mappings property into DomainEntity.

mbrush commented 3 weeks ago

IGNORE THIS UNLESS WE DON'T LIKE THE PROPOSAL LAID OUT ABOVE: A less clean approach IMO, but maybe one worth noting, could be used if we don't want to concern ourselves with this concept vs instance distinction when defining how mappings/ide equivalencies are recorded - and use one approach (e.g. the Mapping pattern) for both. In this case, we would consider moving the mappings property up the hierarchy to live in Entity - as there are things besides Domain Entities that may have mappings / other identifiers we want to reference (e.g. Documents, Methods, Statements). And we should test/demonstrate the Mapping pattern for concepts and instances to confirm that it is suitable.

One possible issue with using Mappings to capture other identifiers for instance-level things is that Mappings use Coding and code - which are defined to hold terms from code sets / terminologies. Instance-level mappings would need to capture database identifier/accession numbers - not terminological codes/concepts. A simple fix here is to not use 'Codings' in the Mappings object. Just define a set of properties that can be used for mapping concept or instance level identifiers. e.g.

Mapping:
- related_identifier: string      # the identifier form the other system (could be database id/accession, or a ontology code)
- label: string                    # the name of the entity / concept in the other system
- identifier_system: string        # the name / uri of the other system
- mapping_relation: enum [exactMatch, narrowMatch,broadMatch, relatedMatch]

. . . such a structure would be sufficient for representing concept-level identifier mappings (e.g. clinvar condition id to mondo term id), or instance-level identifier mappings (e.g. clinvar publication id to pubmed id).

larrybabb commented 2 weeks ago

@mbrush my recommendation is to close this PR and not merge it. Leave the question of adding identifiers to the discussion forum until we determine whether or not we want this. I don't think this should hang out as a PR until this is a more urgent concern.

If you concur then I will leave it to you to Close with comment

larrybabb commented 1 week ago

Spoke with @mbrush we are closing this and we will re-address whether an Identifier attribute ultimately will be added to the core-im or not at a later date.