ga4gh / va-spec

An information model for representing variant annotations.
15 stars 2 forks source link

Modeling 'Genetic Condition' (as a Domain Entity) #24

Open mbrush opened 5 years ago

mbrush commented 5 years ago

Creating this ticket to begin collecting requirements and considerations for modeling "Genetic Conditions". This is the name we have given to the concept of "A disease or set of one or more co-occurring phenotypic features, typically controlled by a single gene or locus with a defined inheritance pattern." At the highest level, we should also consider distinctions between concepts such as Disease, Phenotype, Trait - and their relationship to each other and the notion of a Genetic Condition. But here we consider more practical matters.

Specifically, "Genetic Conditions" fill the descriptor slot in many VA types (e.g. Variant Pathogenicity and Variant Oncogenicity Interpretations), and the qualifier slot in others (e.g. Therapeutic Response Interpretations). In these contexts we are concerned with representing "types" of Conditions, as opposed to specific instances that describe Conditions as they manifest in a particular patient.

While we could simply rely on existing disease and phenotype ontologies to provide terms for these descriptors, some driver projects (e.g. ClinGen) have provided compelling use cases for allowing richer and more flexible representations of Genetic Conditions that require defining an object model. The model defined for ClinGen's Variant Interpretation model may serve as a straw man starting point for what this might look like. Requirements from this effort include:

Again, the goal here is not to model 'instances' of conditions that affect specific patient - so patient-specific features like 'severity' or 'age of onset' may not be relevant (unless they are a distinguishing aspect of a class of disease). The CDPC / Phenopackets work is developing models for describing phenotypes and diseases from this patient-level perspective.

Finally, there are three key subtypes of conditions/diseases that may require specializations to support different types of variant annotations, and serve the needs of different disease research communities:

  1. Mendelian Condition: These are caused by single gene, germline mutations. They are relevant for Variant Pathogenicity Interpretations that describe variants that alone are capable of causing disease.
  2. Cancer: These are caused by a combination of several somatic mutations, some of which activate oncogenes, others which inactivate tumor-suppressor genes, and likely others that modify/contribute to the progression of the disease in other ways. They are relevant for Variant Oncogenicity Interpretations and other somatic interpretation types.
  3. Common Disease: These are multi-genic, with contributions from many variants/genes. They are relevant for Condition Risk and Polygenic Risk annotation types, as these describe variants that may predispose you to getting a disease, but alone may not be 'causal'.
mbrush commented 5 years ago

Some Questions:

mbrush commented 4 years ago

A straw man proposal based on the SEPIO-ClinGen Variant Pathogenicity Interpretation Model (link) is drawn up in the va-spec modeling spreadsheet here.

mbrush commented 4 years ago

Coordinate with Phenopackets representation of Disease: https://phenopackets-schema.readthedocs.io/en/latest/disease.html . . . but this is a model of a disease instance (i.e. a patient's manifestation of a specific disease), whereas we need to model types of conditions.

mbaudis commented 4 years ago

@mbrush I suggest you directly reference this through its SchemaBlocks representation, and go to Phenopackets if you need upstream changes (which then also would be versioned in {S}[B]).

Nice {S}[B] use case - happy to help!

mbrush commented 3 years ago

We may want to be able to associate a disease or phenotype that characterizes the Condition with its specific inheritance pattern, or associated gene(s). The model as currently specified here would not support this.

An alternative proposal here would - by adding a 'hasComponent' field where a compound Condition could reference more fundamental Conditions that comprise it as nested Condition objects within the larger compound one. Within a given Condition then cardinality on the disease and phenotype fields would become 0..1.

Decision: For v0 go with original simple model, test this, collect feedback/requirements, and evolve to more complex model if needed.