d3b-center / pbta-splicing

Splicing analysis across the PBTA
1 stars 1 forks source link

Updated analysis: splicing impact functional sites #207

Closed naqvia closed 10 months ago

naqvia commented 11 months ago

What analysis module should be updated and why?

We want to update splicing impact on functional sites module to accommodate all HGGs, independent of control samples, so we could look at the impact more comprehensively.

What changes need to be made? Please provide enough detail for another participant to make the update.

Parsing and filtering in 01script and all related downstream plotting scripts.

What input data should be used? Which data were used in the version being updated?

rMATs and histologies tsv

When do you expect the revised analysis will be completed?

Who will complete the updated analysis?

@naqvia

naqvia commented 10 months ago

I had to put in lot of work and thinking into this and had to run a variety of scenarios to see what makes the most sense-- biologically. Just doing straight forward differential splicing (by z-score) per sample, and then subsetting by HGGs did not result in CLK1 exon 4 splicing. This is because CLK1 splicing may not be specific enough to a tumor and is extremely heterogeneous across brain tumor types (ie high standard deviation not resulting in a z-score >= 2). So I went ahead and tried out several methods with different filtering and subsetting criteria.

The following method showed CLK1 exon 4 differential splicing: Subsetting samples by HGGs, and then computing differential splicing methods based on that HGG cohort. So we are measuring differential splicing within HGGs now. Indeedn, we now get a recurrent preference for exon 4 skipping. We also get other kinases and genes, but the overall conclusions are similar. This method could be ported and modularized for any histology and independent of control samples.

This may be the most appropriate and robust comparison b/c this module is focusing in on HGGs and the limitation of having one control comparison is removed. Another important reporting difference is now, we don’t have dPSI, but just PSI that is either a skipping or an inclusion event. This will change the downstream scripts and code slightly. This is because the output from rMATs now used is ran on a single sample (not two like we did with the ctrl vs HGG tumor), so we are now dealing with PSIs from single tumor samples. Hence, something would be labeled as a skipping event if the individual tumor PSI is significantly lower than the meanPSI (bc this means there more exon inclusion in other tumor samples), while an inclusion event would result if the tumorPSI is significantly higher than the meanPSI. WIll stack PRs for each script. cc @jharenza

naqvia commented 10 months ago

Completed and merged