Open VitorAguiar opened 2 years ago
Hi Vitor,
This may be an edge case which we will try to fix in future versions. For now, if you already know which junction/cluster you are interested in. You can simply manually define clusters (I believe) by overriding the _refined file.
Best, Yang
On Mon, Nov 21, 2022 at 3:46 PM Vitor @.***> wrote:
Hi, thanks for developing leafcutter!
I'm trying to understand the impact of the option -p in the intron clustering step regarding the generation of "_refined" from "_pooled" (using leafcutter_cluster_regtools.py).
On a differential splicing analysis setting (cases vs. controls), I've been clustering junctions from a heterogeneous group of samples (i.e. different datasets, tissues, varying sequencing depth, etc), and comparing that with clustering on a per dataset/tissue basis. With the latter approach, I see a nice exon inclusion event in a gene of interest, matching my expectation from a differential transcript usage analysis.
However, when I do clustering on the larger and heterogeneous set of samples, I first get in the "*_pooled" file an enormous cluster with 495 junctions that includes my event of interest, but extends beyond the gene. While some junctions have only 3 reads, others have >170,000.
Then, if I use -p 0.00001, I get in the "*_refined" file a smaller cluster with 14 junctions, including my splicing event of interest, with the number of reads per junction ranging from 98 to 170,000. But the leafviz plot is noisy since I include many junctions with few supporting reads, and the delta PSI is small since it's relative to the cluster.
However, if I use -p 0.05, I lose the entire cluster in "*_refined", even though my event of interest is supported by >96,000 reads. Is that a desired behavior?
I'd love to hear from @davidaknowles https://github.com/davidaknowles, @goldenflaw https://github.com/goldenflaw et al.
Thank you! Vitor
— Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/221, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCPASP6BUBFLVCFH4NDWJPUT7ANCNFSM6AAAAAASHC6L6Y . You are receiving this because you were mentioned.Message ID: @.***>
Hi, thanks for developing leafcutter!
I'm trying to understand the impact of the option -p in the intron clustering step regarding the generation of "*_refined" from "*_pooled" (using leafcutter_cluster_regtools.py).
On a differential splicing analysis setting (cases vs. controls), I've been clustering junctions from a heterogeneous group of samples (i.e. different datasets, tissues, varying sequencing depth, etc), and comparing that with clustering on a per dataset/tissue basis. With the latter approach, I see a nice exon inclusion event in a gene of interest, matching my expectation from a differential transcript usage analysis.
However, when I do clustering on the larger and heterogeneous set of samples, I first get in the "*_pooled" file an enormous cluster with 495 junctions that includes my event of interest, but extends beyond the gene. While some junctions have only 3 reads, others have >170,000.
Then, if I use -p 0.00001, I get in the "*_refined" file a smaller cluster with 14 junctions, including my splicing event of interest, with the number of reads per junction ranging from 98 to 170,000. But the leafviz plot is noisy since I include many junctions with few supporting reads, and the delta PSI is small since it's relative to the cluster.
However, if I use -p 0.05, I lose the entire cluster in "*_refined", even though my event of interest is supported by >96,000 reads. Is that a desired behavior?
I'd love to hear from @davidaknowles, @goldenflaw et al.
Thank you! Vitor