ga4gh / va-spec

An information model for representing variant annotations.
14 stars 2 forks source link

Align high level class structure between core-source and va-spec models #110

Open mbrush opened 1 year ago

mbrush commented 1 year ago

I reorganized high level class structure in va-spec to support the ValueEntity vs ExtensibleEntity distinction, and better align with what is in core-source model. But there is not yet complete alignment, and some elements needed for VA are missing from the core-source representation.

This issue compares the current high level class structure of the gks-metaschema and va-spec models, to facilitate alignment needed before the va-spec can drop these classes and re-use what is in core-source.

Diagram 1: The current core-source upper level class hierarchy.

Note that not all classes are shown as boxes in the diagram - some concrete subclasses are listed in the bottom section of abstract class boxes such as DomainEntity and ExtensibleEntity. image

Classes in red are those that I suspect should be moved/re-organized, as proposed in Diagram 2 below.

Diagram 2: The core-source hierarchy after I amended what I suspect may be oversights?

image

Specific amendments made here:

Diagram 3: The va-spec upper class hierarchy

. . . how I would refactor things in the VA IM to support the ValueEntity vs ExtensibleEntity distinction, and better align with what is in core-source model. (But as noted, this is not yet fully aligned with the organization of high level classes in the metaschema per diagrams above.) image

Key differences / features of VA high level class organization to consider/align

Questions / Issue with this Model: Note that some issues arise with this model when considering how the classes partitioned under ExtensibleEntity (e.g. Coding) may be used within ValueEntities (e.g. Proposition if this is treated as a ValueEntity). Some general questions to think about that might inform our thinking here:

larrybabb commented 1 year ago

There's a lot to unpack here. But here are my thoughts once we come around to discussing this...

  1. ValueObjectDescriptor - this class does not fit the semantics of UtilityEntity as I look at the other subclasses in that set. It may be a special EntityDescriptor class that directly descends from ExtensibleEntity? In any case, the interesting thing about Descriptors is that they all create wrappers for ValueObjects which can be extended, identified and tied to a given authority's record. So there could also be provenance, recordmetadata and even a method for a given ValueObjectDescriptor (IMO). Descriptors are really a kind of record-level statement about a ValueObject (again IMO).
  2. I agree that Propostion does not have to be a ValueObject, but I still feel quite strongly that all of the attributes of a given concrete Proposition MUST be required. We can discuss. It is also worth noting that @ahwagner and I are coming around to the idea that while the Proposition is an incredible important and useful semantic that provides the Definitional representation of a Statement it does not necessarily need to be a separate class. We need to come up with a way of specifying our Statements such that the embedded Definition that computationally and precisely represents the basis behind the Statement is super-transparent to implementers. It may be best to keep it as a separate class for just that purpose, but every Statement will have one and only one Proposition and those Propositions will be tightly constrained with a full complement of required fields.

I hear your argument about optional fields. These are fine on classes that are not able to be computationally precise. We want to really try to achieve the notion of interoperability which is confounded IMO by flexibility in how data is represented. Optional fields, while necessary, should be segregated from the truly interoperable substructures if at all possible and reasonable.

mbrush commented 1 year ago

April 2023 Update: Clingen/VICC are no longer pursuing the descriptor-based approach to value object representation in their initial implementation models. Value objects and descriptor objects will be collapsed into a single object - folding together non-essential decoration and essential identifying information. For objects where we want to compute identifiers, a separate specification will indicate the subset of fields to be used for this purpose.

Given this development, we no longer need to make a class-level distinction between Value Objects and Extensible Entities, as in the diagrams above, and as in the current GKS foundation/coure-source model. Every class should now be extensible. This IMO simplifies our high level class structure, and moves us past many of the concerns / alignment issues documented above in this ticket.

A much simpler aligned high-level class structure would look roughly as below:

image

Notes / Rationale: