Open ahwagner opened 5 years ago
An obvious consequence of this proposed feature model is that we'll need additional logic or genomic coordinate reference table to support translating genomic queries to match transcript or protein interpretations.
An alternative model where we store these interpretations as genomic ranges (e.g. CIViC-style) would be confusing with protein changes as ref
and alt
. This model could work, however, by creating a comprehensive set of genomic features corresponding to a protein change, and associating each with the corresponding interpretation.
For now, I think I prefer the alternate model: From above: genomic features optionally include:
seq_id
, start
, stop
fields required.seq_id
is an unambiguous chromosome ~, transcript, or protein~ identifierI think that @mbrush or @larrybabb may have some important insights here that we can incorporate on the ground floor.
I propose we use the molecular profile
container type for all sequence features, allowing us to aggregate genomic variants under a flexible parent type that can incorporate other variant types as well (e.g. expression, methylation, wild-type, other functional effects)
@ahwagner I'm not up to speed on molecular profile
. Can you share docs or info on how that is currently defined?
The GA4GH VR team is trying to hone in on a concept that may be conceptually similar. I think we should bring this up for discussion on the VR call with @reece and @mbrush. Also @rrfreimuth has been taking a deep dive into the conceptual modeling of sequence feature using the SO definitions. I think @rrfreimuth will have some valuable insights into getting this sorted out consistently across your project, GA4GH and HL7.
That sounds great, would love to hear thoughts from @rrfreimuth and others addressing this. I think the only description of molecular profile
available is from the Jax CKB glossary:
Molecular Profile
: Consists of one or more variants, encompassing any type of molecular data, and is designed to handle complex genomic signatures
Precise features may lean on work done by the VMC group: https://github.com/ga4gh/vmc-python/tree/master/notebooks
In VR discussion today, the notion of an interpretation scope as an alternative name for molecular profile was mentioned (to avoid overloading the term with the patient-centric scope). Pushback to interpretation scope was the use of "interpretation" in the name, when the concept describes not the interpretation itself but the sequence feature description associated with the interpretation.
Will update here with the GA4GH working documentation of this bucket concept.
The JAX-CKB molecular profile is described in detail in recently published manuscript:
Thank you for sharing... It would be great if we could use this to do some detailed modeling around the complex molecular profile - if nothing else to understand it better. The fusions in particular. I'm trying (with my limited understanding) to determine if "ELM4-ALK" is a single concept or does it vary based on the variants it may or may not have in a given instance of the fusion. (sorry if that sounds totally ignorant)
Maybe we could dissect fig 3 sometime and appreciate the modeling nuances and challenges with representing fusions and decreased expression and region deletions, etc...
Requirements for sequence (genomic / proteomic) features describing associations detailed in #1.
Proposal: features must include:
genomic features optionally include:
seq_id
,start
,stop
fields required.seq_id
may be an unambiguous chromosome, transcript, or protein identifier