Salient elements of gene fusions

ahwagner commented 4 years ago

Creating this issue to explicitly capture and discuss the notion of minimal data elements in gene fusions, and how these inform the representation of those variants.

ahwagner commented 4 years ago

First, some background from the project page:

NTRK Alterations in Pan-Cancer Adult and Pediatric Malignancies (publication)
NTRK curation elements worksheet (excel sheet)
HGVS Fusion Specification (accepted proposal)
- Marilyn Li’s recommended addition
dbVar SV representation (webpage)
GA4GH VR fusion requirements meta-thread (GitHub issue)
Internal fusion specification used at NCI (GitHub comment from NCI presentation for VICC)

ahwagner commented 4 years ago

Second, quoting a comment from @TanskaAnnna in the VR fusions meta-thread:

Hi All! As a medical scientist I would find it beneficial to have two forms of fusion nomenclature: a short one and a long one. I think short one should include information that is crucial for the clinical utility of the fusion. The longer form would include all details that help to identify the exact localisation of the fusion in genome. I would see the longer form to be included in the supplementary data of the medical report and a shorter version would be included in the main report comment together with a clinical utility of the fusion. Looking on the NTRK fusions curation elements table: longer version of a fusion description could include all information presented in that table. Short nomenclature perhaps could be limited to: genome version, refseq transcripts, gene names, their positions (3' or 5'), exons and functional domains information. It should be clearly stated if this causes loss or gain of function as this information is crucial for the treatment decision. Additional information about resistance mutations should be added if applicable. I’m really interested what are your thoughts on this idea?

Whether we should comment if the fusion was “in-frame” or not it depends on the context. From my experience I can say that when I was dealing with a DNA seq results I would not report a structural variant as a fusion (especially for new fusion genes) if it was out of frame and I could not confirm it by RNA methods. However, reading frame information could be beneficial, for example, when DNA seq was out-of frame but RNA results showed that the final product was in frame (for instance in exon skipping situation).

Moving forward on this we should distinguish DNA and RNA focused nomenclature. Some labs haven’t got RNA sequencing methods in place. Also, for some technologies it's hard to determine the exact breakpoint of a fusion. When a breakpoint is in the intron the most important question is which exons are fused to each other, the exact genomic position has less clinical utility value. Nevertheless, when we're dealing with a break point in exonic sequence the exact nucleotide position is crucial to determine functionality of a fusion.

When it comes to RNA nomenclature Li's proposal looks nice and practical and I would definitely like to add gene name to both forms. Ordulu et al. (2014) suggestions are good but complex and when I reviewed this nomenclature in my lab it seemed like this complexity is not applicable for a clinical reporting.

d-cameron commented 4 years ago

The DNA and RNA representations need to be treated as fundamentally distinct data elements. This is critical since there not a 1-1 relationship between DNA breakpoints and RNA fusions transcripts but a many-to-many relationship. That is, a fusion can involve multiple breakpoints, and a breakpoint can result in multiple fusion transcripts.

Predicting RNA fusions from DNA breakpoints is decidedly non-trivial. The only implementation I'm aware of is LINX.

Whilst many tools/pipelines are unable to handle fusions resulting from complex DNA rearrangements, pan-cancer, they account for around 16% of driver fusions, so are fairly important.

My recommendation is that a DNA nomenclature be used/created that define the relevant sequence of breakpoints, and the RNA nomenclature be reused to define the fusion product. This allows for a clear and clean separation between 'these are the rearrangements ' and 'this is the RNA impact'. This approach cleanly handles out of frame fusions as you can specify the rearrangements in the DNA, and come to the conclusion based that there are no resultant functional gene fusions (thus nothing to report on the RNA side).

ahwagner commented 3 years ago

On the 2/10/21 call, we started arranging the salient elements into top-level categories:

Gene Fusion Event (DNA change / mechanism)
Gene Fusion Product Description (RNA Description)
Functional Characterization (Reading frame, preserved domains, etc)

cancervariants / fusions

Salient elements of gene fusions #1