davidaknowles / leafcutter

Annotation-free quantification of RNA splicing. Yang I. Li, David A. Knowles, Jack Humphrey, Alvaro N. Barbeira, Scott P. Dickinson, Hae Kyung Im, Jonathan K. Pritchard
http://davidaknowles.github.io/leafcutter/
Apache License 2.0
207 stars 115 forks source link

Positive splicing events are not be detected #228

Open ym-chen opened 1 year ago

ym-chen commented 1 year ago

Hi, I use the Leafcutter to analyze a batch of samples. One of the samples has a splice event in the region: chr7:116411552-116415165, and the other samples are negative. But I can't find any splicing event in the positive region from the result file of leafcutter. I wonder how leafcutter process this data. The specific parameters of leafcutter which I run the data like the following:

leafcutter_cluster.py -m 2 -M 1 -l 500000
leafcutter_ds.R --min_samples_per_intron=1 --min_samples_per_group=0 --timeout=300 --min_coverage=0

The first data, I can find the junction site in the positive.junc. And also can find the intron-cluster in perind_numers.counts.gz. The number 0 in perind_numers.counts.gz is the junction count of negative samples (total 11 negative samples). But in the DAS_cluster_significance.txt, I find the cluster is filtered. It seems like some parameter(s) works. And how can I change my parameters?

## positive.junc
chr7    116411708       116414934       .       7       -
chr7    116411708       116411902       .       4       -
chr7    116411708       116414934       .       2       +

## perind_numers.counts.gz
chr7:116411708:116411903:clu_63826_NA 0 0 0 0 0 4 0 0 0 0 0 0
chr7:116411708:116414935:clu_63826_NA 0 0 0 0 0 9 0 0 0 0 0 0

## DAS_cluster_significance.txt
chr7:clu_63826_NA       <=1 sample with coverage>0      NA      NA      NA      NA      MET

The second data, I can find the junction site in the positive.junc, but can't find the intron-cluster in positive.junc.Batch.sorted.gz and perind_numers.counts.gz. I set the --minreads=1, but why is it still not clustered?

## positive.junc
chr7    116412043       116414934       .       1       +
chr7    116411708       116414934       .       11      -
chr7    116411708       116414934       .       11      +

This is very important to me, hope to get your help.

goldenflaw commented 1 year ago

I believe there has been other reports that some of the parameters in leafcutter do not work as intended, including when reducing the minimum number of reads too much. We are working on re-writing leafcutter with python3 and add functionality plus bug fixes, but we only expect to release it late this year.

On Thu, Feb 23, 2023, 21:50 ym-chen @.***> wrote:

Hi, I use the Leafcutter to analyze a batch of samples. One of the samples has a splice event in the region: chr7:116411552-116415165, and the other samples are negative. But I can't find any splicing event in the positive region from the result file of leafcutter. I wonder how leafcutter process this data. The specific parameters of leafcutter which I run the data like the following:

leafcutter_cluster.py -m 2 -M 1 -l 500000 leafcutter_ds.R --min_samples_per_intron=1 --min_samples_per_group=0 --timeout=300 --min_coverage=0

The first data, I can find the junction site in the positive.junc. And also can find the intron-cluster in perind_numers.counts.gz. The number 0 in perind_numers.counts.gz is the junction count of negative samples (total 11 negative samples). But in the DAS_cluster_significance.txt, I find the cluster is filtered. It seems like some parameter(s) works. And how can I change my parameters?

positive.junc

chr7 116411708 116414934 . 7 - chr7 116411708 116411902 . 4 - chr7 116411708 116414934 . 2 +

perind_numers.counts.gz

chr7:116411708:116411903:clu_63826_NA 0 0 0 0 0 4 0 0 0 0 0 0 chr7:116411708:116414935:clu_63826_NA 0 0 0 0 0 9 0 0 0 0 0 0

DAS_cluster_significance.txt

chr7:clu_63826_NA <=1 sample with coverage>0 NA NA NA NA MET

The second data, I can find the junction site in the positive.junc, but can't find the intron-cluster in positive.junc.Batch.sorted.gz and perind_numers.counts.gz. I set the --minreads=1, but why is it still not clustered?

positive.junc

chr7 116412043 116414934 . 1 + chr7 116411708 116414934 . 11 - chr7 116411708 116414934 . 11 +

This is very important to me, hope to get your help.

— Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/228, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCJC2KN3BABLJDU4SXDWZAVYHANCNFSM6AAAAAAVGNOIDY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ym-chen commented 1 year ago

I checked the log file, but still can't find the exact reason. I may do more test in the future. Thanks for your reply.

I believe there has been other reports that some of the parameters in leafcutter do not work as intended, including when reducing the minimum number of reads too much. We are working on re-writing leafcutter with python3 and add functionality plus bug fixes, but we only expect to release it late this year. On Thu, Feb 23, 2023, 21:50 ym-chen @.> wrote: Hi, I use the Leafcutter to analyze a batch of samples. One of the samples has a splice event in the region: chr7:116411552-116415165, and the other samples are negative. But I can't find any splicing event in the positive region from the result file of leafcutter. I wonder how leafcutter process this data. The specific parameters of leafcutter which I run the data like the following: leafcutter_cluster.py -m 2 -M 1 -l 500000 leafcutter_ds.R --min_samples_per_intron=1 --min_samples_per_group=0 --timeout=300 --min_coverage=0 The first data, I can find the junction site in the positive.junc. And also can find the intron-cluster in perind_numers.counts.gz. The number 0 in perind_numers.counts.gz is the junction count of negative samples (total 11 negative samples). But in the DAS_cluster_significance.txt, I find the cluster is filtered. It seems like some parameter(s) works. And how can I change my parameters? ## positive.junc chr7 116411708 116414934 . 7 - chr7 116411708 116411902 . 4 - chr7 116411708 116414934 . 2 + ## perind_numers.counts.gz chr7:116411708:116411903:clu_63826_NA 0 0 0 0 0 4 0 0 0 0 0 0 chr7:116411708:116414935:clu_63826_NA 0 0 0 0 0 9 0 0 0 0 0 0 ## DAS_cluster_significance.txt chr7:clu_63826_NA <=1 sample with coverage>0 NA NA NA NA MET The second data, I can find the junction site in the positive.junc, but can't find the intron-cluster in positive.junc.Batch.sorted.gz and perind_numers.counts.gz. I set the --minreads=1, but why is it still not clustered? ## positive.junc chr7 116412043 116414934 . 1 + chr7 116411708 116414934 . 11 - chr7 116411708 116414934 . 11 + This is very important to me, hope to get your help. — Reply to this email directly, view it on GitHub <#228>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCJC2KN3BABLJDU4SXDWZAVYHANCNFSM6AAAAAAVGNOIDY . You are receiving this because you are subscribed to this thread.Message ID: @.>