cancervariants / metakb

Central repository for the VICC metakb web application
MIT License
15 stars 4 forks source link

sequence features #6

Open ahwagner opened 5 years ago

ahwagner commented 5 years ago

Requirements for sequence (genomic / proteomic) features describing associations detailed in #1.

Proposal: features must include:

genomic features optionally include:

ahwagner commented 5 years ago

An obvious consequence of this proposed feature model is that we'll need additional logic or genomic coordinate reference table to support translating genomic queries to match transcript or protein interpretations.

An alternative model where we store these interpretations as genomic ranges (e.g. CIViC-style) would be confusing with protein changes as ref and alt. This model could work, however, by creating a comprehensive set of genomic features corresponding to a protein change, and associating each with the corresponding interpretation.

For now, I think I prefer the alternate model: From above: genomic features optionally include:

ahwagner commented 5 years ago

I think that @mbrush or @larrybabb may have some important insights here that we can incorporate on the ground floor.

ahwagner commented 5 years ago

I propose we use the molecular profile container type for all sequence features, allowing us to aggregate genomic variants under a flexible parent type that can incorporate other variant types as well (e.g. expression, methylation, wild-type, other functional effects)

larrybabb commented 5 years ago

@ahwagner I'm not up to speed on molecular profile. Can you share docs or info on how that is currently defined?

The GA4GH VR team is trying to hone in on a concept that may be conceptually similar. I think we should bring this up for discussion on the VR call with @reece and @mbrush. Also @rrfreimuth has been taking a deep dive into the conceptual modeling of sequence feature using the SO definitions. I think @rrfreimuth will have some valuable insights into getting this sorted out consistently across your project, GA4GH and HL7.

ahwagner commented 5 years ago

That sounds great, would love to hear thoughts from @rrfreimuth and others addressing this. I think the only description of molecular profile available is from the Jax CKB glossary:

Molecular Profile: Consists of one or more variants, encompassing any type of molecular data, and is designed to handle complex genomic signatures

ahwagner commented 5 years ago

Precise features may lean on work done by the VMC group: https://github.com/ga4gh/vmc-python/tree/master/notebooks

In VR discussion today, the notion of an interpretation scope as an alternative name for molecular profile was mentioned (to avoid overloading the term with the patient-centric scope). Pushback to interpretation scope was the use of "interpretation" in the name, when the concept describes not the interpretation itself but the sequence feature description associated with the interpretation.

Will update here with the GA4GH working documentation of this bucket concept.

ahwagner commented 5 years ago

The JAX-CKB molecular profile is described in detail in recently published manuscript:

https://www.nature.com/articles/s41698-018-0073-y

larrybabb commented 5 years ago

Thank you for sharing... It would be great if we could use this to do some detailed modeling around the complex molecular profile - if nothing else to understand it better. The fusions in particular. I'm trying (with my limited understanding) to determine if "ELM4-ALK" is a single concept or does it vary based on the variants it may or may not have in a given instance of the fusion. (sorry if that sounds totally ignorant)

Maybe we could dissect fig 3 sometime and appreciate the modeling nuances and challenges with representing fusions and decreased expression and region deletions, etc...