Interpreting results and prioritizing alterations

tjbencomo commented 10 months ago

Hi,

I have some questions on how to interpret the output of the splicing module and prioritize identified alterations.

First, I want to confirm that any alterations reported by the splicing module are in fact deletions, suggesting a loss of 1+ exons within the region defined in the intron column in the cancer.introns file.

Second, I have run the splicing module on 100+ tumor samples and would like to look for recurrent alterations while avoiding false positives. In the STAR-Fusion module there are metrics like FFPM that are used to screen out fusions with low expression. Is there an equivalent metric here - do you suggest a minimum threshold for uniq_mapped?

Finally, after visually examining some candidate alterations, I have realized some of the alterations affect exons that are not present in all isoforms of the affected gene. When the missing exon is only present in some of the transcripts, is there a good way to determine whether the alteration is relevant? My first thought is to only consider alterations relevant if they affect transcripts that are canonical or have strong expression in that sample. Does this make sense?

Thanks!

brianjohnhaas commented 10 months ago

Hi,

The ctat-splicing module isn't focused on exon deletions, only on identifying those spliced introns that are enriched or cancer-specific according to our database of TCGA and GTEx splicing patterns. There are known cases such as in EGFR where the splicing patterns result from intra-gene deletions, but this isn't necessarily the case across our database, as it's derived from RNA-seq analysis and not DNA-seq analysis.

Much of this continues to be a research exercise and I don't have minimum thresholds for uniq_mapped. It just reports where the currently defined cancer-enriched splicing patterns are observed across the TCGA cohorts.

Hope this helps,

~b

On Tue, Aug 15, 2023 at 3:18 AM Tomas Bencomo @.***> wrote:

Hi,

I have some questions on how to interpret the output of the splicing module and prioritize identified alterations.

First, I want to confirm that any alterations reported by the splicing module are in fact deletions, suggesting a loss of 1+ exons within the region defined in the intron column in the cancer.introns file.

Second, I have run the splicing module on 100+ tumor samples and would like to look for recurrent alterations while avoiding false positives. In the STAR-Fusion module there are metrics like FFPM that are used to screen out fusions with low expression. Is there an equivalent metric here - do you suggest a minimum threshold for uniq_mapped?

Finally, after visually examining some candidate alterations, I have realized some of the alterations affect exons that are not present in all isoforms of the affected gene. When the missing exon is only present in some of the transcripts, is there a good way to determine whether the alteration is relevant? My first thought is to only consider alterations relevant if they affect transcripts that are canonical or have strong expression in that sample. Does this make sense?

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/NCIP/CTAT-SPLICING/issues/8, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX3BVTXPA7NYCVMWZODXVMPD3ANCNFSM6AAAAAA3QWPAF4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

tjbencomo commented 10 months ago

Hi Brian,

Thanks for the response. I think I'm still confused about the definition of "cancer-specific spliced intron". Using the example provided in the documentation

intron                    strand  genes                    uniq_mapped  multi_mapped  TCGA_sample_counts                               GTEx_sample_counts  variant_name
chr7:55200414-55202516    +       EGFR^ENSG00000146648.14  114          0             GBM:1:0.59,LGG:1:0.19                            NA                  EGFRvIVb

what exactly does it mean that the splicing module identified the intron chr7:55200414-55202516.? Does it mean that this intron segment is detected in RNA-Seq reads when we would expect the intron to not be present because it should have been spliced out and not present in the mRNA sequence? And then the "cancer-specific" part comes from the fact that we see this intron also present in TCGA cancer samples when it shouldn't be there?

Apologies for the confusion.

-Tomas

brianjohnhaas commented 10 months ago

It’s actually detecting spliced introns in samples. In this example, the intron is spliced in some tumor samples but not in normal samples (GTEx)

On Tue, Aug 15, 2023 at 9:58 PM Tomas Bencomo @.***> wrote:

Hi Brian,

Thanks for the response. I think I'm still confused about the definition of "cancer-specific spliced intron". Using the example provided in the documentation

intron strand genes uniq_mapped multi_mapped TCGA_sample_counts GTEx_sample_counts variant_name chr7:55200414-55202516 + EGFR^ENSG00000146648.14 114 0 GBM:1:0.59,LGG:1:0.19 NA EGFRvIVb

what exactly does it mean that the splicing module identified the intron chr7:55200414-55202516.? Does it mean that this intron segment is detected in RNA-Seq reads when we would expect the intron to not be present because it should have been spliced out and not present in the mRNA sequence? And then the "cancer-specific" part comes from the fact that we see this intron also present in TCGA cancer samples when it shouldn't be there?

Apologies for the confusion.

-Tomas

— Reply to this email directly, view it on GitHub https://github.com/NCIP/CTAT-SPLICING/issues/8#issuecomment-1679854607, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX7G3QOADF2V56A742TXVQSLLANCNFSM6AAAAAA3QWPAF4 . You are receiving this because you commented.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

NCIP / CTAT-SPLICING

Interpreting results and prioritizing alterations #8

--

--