Closed ahwagner closed 5 months ago
First, some background from the project page:
Second, quoting a comment from @TanskaAnnna in the VR fusions meta-thread:
Hi All! As a medical scientist I would find it beneficial to have two forms of fusion nomenclature: a short one and a long one. I think short one should include information that is crucial for the clinical utility of the fusion. The longer form would include all details that help to identify the exact localisation of the fusion in genome. I would see the longer form to be included in the supplementary data of the medical report and a shorter version would be included in the main report comment together with a clinical utility of the fusion. Looking on the NTRK fusions curation elements table: longer version of a fusion description could include all information presented in that table. Short nomenclature perhaps could be limited to: genome version, refseq transcripts, gene names, their positions (3' or 5'), exons and functional domains information. It should be clearly stated if this causes loss or gain of function as this information is crucial for the treatment decision. Additional information about resistance mutations should be added if applicable. I’m really interested what are your thoughts on this idea?
Whether we should comment if the fusion was “in-frame” or not it depends on the context. From my experience I can say that when I was dealing with a DNA seq results I would not report a structural variant as a fusion (especially for new fusion genes) if it was out of frame and I could not confirm it by RNA methods. However, reading frame information could be beneficial, for example, when DNA seq was out-of frame but RNA results showed that the final product was in frame (for instance in exon skipping situation).
Moving forward on this we should distinguish DNA and RNA focused nomenclature. Some labs haven’t got RNA sequencing methods in place. Also, for some technologies it's hard to determine the exact breakpoint of a fusion. When a breakpoint is in the intron the most important question is which exons are fused to each other, the exact genomic position has less clinical utility value. Nevertheless, when we're dealing with a break point in exonic sequence the exact nucleotide position is crucial to determine functionality of a fusion.
When it comes to RNA nomenclature Li's proposal looks nice and practical and I would definitely like to add gene name to both forms. Ordulu et al. (2014) suggestions are good but complex and when I reviewed this nomenclature in my lab it seemed like this complexity is not applicable for a clinical reporting.
The DNA and RNA representations need to be treated as fundamentally distinct data elements. This is critical since there not a 1-1 relationship between DNA breakpoints and RNA fusions transcripts but a many-to-many relationship. That is, a fusion can involve multiple breakpoints, and a breakpoint can result in multiple fusion transcripts.
Predicting RNA fusions from DNA breakpoints is decidedly non-trivial. The only implementation I'm aware of is LINX.
Whilst many tools/pipelines are unable to handle fusions resulting from complex DNA rearrangements, pan-cancer, they account for around 16% of driver fusions, so are fairly important.
My recommendation is that a DNA nomenclature be used/created that define the relevant sequence of breakpoints, and the RNA nomenclature be reused to define the fusion product. This allows for a clear and clean separation between 'these are the rearrangements ' and 'this is the RNA impact'. This approach cleanly handles out of frame fusions as you can specify the rearrangements in the DNA, and come to the conclusion based that there are no resultant functional gene fusions (thus nothing to report on the RNA side).
On the 2/10/21 call, we started arranging the salient elements into top-level categories:
Creating this issue to explicitly capture and discuss the notion of minimal data elements in gene fusions, and how these inform the representation of those variants.