davidaknowles / leafcutter

Annotation-free quantification of RNA splicing. Yang I. Li, David A. Knowles, Jack Humphrey, Alvaro N. Barbeira, Scott P. Dickinson, Hae Kyung Im, Jonathan K. Pritchard
http://davidaknowles.github.io/leafcutter/
Apache License 2.0
207 stars 115 forks source link

which pipeline to prepare data should I use? #118

Closed QuanLG closed 4 years ago

QuanLG commented 4 years ago

Hi, Recently I noticed that there are tow pipeline for leafcutter to prepare data as shown in the figure. I don't know which is the most suitable to use. And,the pipeline on web, when I use the code 'leafcutter_cluster.py' , the parameter '--strand' should I set it to True? pipeline1

pipeline2

goldenflaw commented 4 years ago

The regtools version speeds the pipeline considerably. You can use strand if you care about it but it optional.

Best Yang

On Sun, Dec 1, 2019, 20:32 QuanLG notifications@github.com wrote:

Hi, Recently I noticed that there are tow pipeline for leafcutter to prepare data as shown in the figure. I don't know which is the most suitable to use. And,the pipeline on web, when I use the code 'leafcutter_cluster.py' , the parameter '--strand' should I set it to True? [image: pipeline1] https://user-images.githubusercontent.com/58326238/69926292-d82fba80-14ee-11ea-9ccd-5a7d16e3e08c.png

[image: pipeline2] https://user-images.githubusercontent.com/58326238/69926298-dfef5f00-14ee-11ea-9c9a-05d0ebdb5b38.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/118?email_source=notifications&email_token=ABGWTCNWHP26ICXO3ZPJGFLQWRXUDA5CNFSM4JTPIVKKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H5EFDNA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCOM57D73ML77EPBACDQWRXUDANCNFSM4JTPIVKA .

QuanLG commented 4 years ago

I use the regtools to prepare the bam from hista2,then I use the 'leafcutter_cluster_regtools.py ' to cluster ,but this is a erro for example :

scanning /mnt/workShop/08_AS_5/leafcutter/2_4_out/test_regtools/bam/R19018652-20190417-NGS-1-SP1904141846_combined.sorted.bam.junc2... scanning /mnt/workShop/08_AS_5/leafcutter/2_4_out/test_regtools/bam/R19028051-20190618-NGS-1-SP1906071999_combined.sorted.bam.junc2... scanning /mnt/workShop/08_AS_5/leafcutter/2_4_out/test_regtools/bam/R19028051-20190618-NGS-1-SP1906072001_combined.sorted.bam.junc2... scanning /mnt/workShop/08_AS_5/leafcutter/2_4_out/test_regtools/bam/R19034797-20190801-NGS-1-SP1907152069_combined.sorted.bam.junc2... scanning /mnt/workShop/08_AS_5/leafcutter/2_4_out/test_regtools/bam/R19036417-20190815-NGS-1-SP1905021937_combined.sorted.bam.junc2... Parsing... GL000200.1:?..Traceback (most recent call last): File "/home/opt/leafcutter/clustering/leafcutter_cluster_regtools.py", line 534, in main(options, libl) File "/home/opt/leafcutter/clustering/leafcutter_cluster_regtools.py", line 14, in main pool_junc_reads(libl, options) File "/home/opt/leafcutter/clustering/leafcutter_cluster_regtools.py", line 72, in pool_junc_reads clu = cluster_intervals(read_ks)[0] File "/home/opt/leafcutter/clustering/leafcutter_cluster_regtools.py", line 337, in cluster_intervals current = E[0] IndexError: list index out of range

goldenflaw commented 4 years ago

Please see if there are weird chromosomes without any junctions.

Best, Yang

On Tue, Dec 10, 2019, 03:32 QuanLG notifications@github.com wrote:

I use the regtools to prepare the bam from hista2,then I use the 'leafcutter_cluster_regtools.py ' to cluster ,but this is a erro for example :

scanning /mnt/workShop/08_AS_5/leafcutter/2_4_out/test_regtools/bam/R19018652-20190417-NGS-1-SP1904141846_combined.sorted.bam.junc2... scanning /mnt/workShop/08_AS_5/leafcutter/2_4_out/test_regtools/bam/R19028051-20190618-NGS-1-SP1906071999_combined.sorted.bam.junc2... scanning /mnt/workShop/08_AS_5/leafcutter/2_4_out/test_regtools/bam/R19028051-20190618-NGS-1-SP1906072001_combined.sorted.bam.junc2... scanning /mnt/workShop/08_AS_5/leafcutter/2_4_out/test_regtools/bam/R19034797-20190801-NGS-1-SP1907152069_combined.sorted.bam.junc2... scanning /mnt/workShop/08_AS_5/leafcutter/2_4_out/test_regtools/bam/R19036417-20190815-NGS-1-SP1905021937_combined.sorted.bam.junc2... Parsing... GL000200.1:?..Traceback (most recent call last): File "/home/opt/leafcutter/clustering/leafcutter_cluster_regtools.py", line 534, in main(options, libl) File "/home/opt/leafcutter/clustering/leafcutter_cluster_regtools.py", line 14, in main pool_junc_reads(libl, options) File "/home/opt/leafcutter/clustering/leafcutter_cluster_regtools.py", line 72, in pool_junc_reads clu = cluster_intervals(read_ks)[0] File "/home/opt/leafcutter/clustering/leafcutter_cluster_regtools.py", line 337, in cluster_intervals current = E[0] IndexError: list index out of range

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/118?email_source=notifications&email_token=ABGWTCJGYHDS5U2NKDTFLSDQX5O2LA5CNFSM4JTPIVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGOSIXA#issuecomment-563946588, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCJQE3HTSOHDQAOR3BDQX5O2LANCNFSM4JTPIVKA .

QuanLG commented 4 years ago

thanks Yang. I remove the 'GL00* ' chromosomes ,the 'leafcutter_cluster_regtools.py' can cluster . And ,I want to know if I need to use samtools to filter reads first,then use the regtools to find junction reads.

goldenflaw commented 4 years ago

This entirely depends on your application. If you think that some reads need to be filtered out, you should do that first.

Best, Yang

On Tue, Dec 10, 2019 at 7:30 PM QuanLG notifications@github.com wrote:

thanks Yang. I remove the 'GL00* ' chromosomes ,the 'leafcutter_cluster_regtools.py' can cluster . And ,I want to know if I need to use samtools to filter reads first,then use the regtools to find junction reads.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/118?email_source=notifications&email_token=ABGWTCJVFRRQOKRPT6ME35TQYA7B5A5CNFSM4JTPIVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGRRNBA#issuecomment-564336260, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCNPEBYUKPKEST35ZBTQYA7B5ANCNFSM4JTPIVKA .

QuanLG commented 4 years ago
    I use the  'leafcutter_cluster_regtools.py' to cluster ,then I use the  'leafcutter_ds.R' to differential splicing analysis and use the 'prepare_results.R ' to catch  result.
    I extract the clusters information form 'RData' as follows,and find a problem

"clu9280+" 4 "chr6:29912393-29912840" "HLA-A" "cryptic" 0.000108 "clu14342-" 21 "chr12:125396517-125398114" "." "annotated" 0.000149 "clu17618?" 25 "chr16:22545157-22547336" "." "annotated" 0.000158 "clu6660+" 3 "chr8:59465862-59477581" "SDCBP" "annotated" 0.000158 "clu637+" 9 "chr17:7737668-7748219" "KDM6B" "cryptic" 0.000221 "clu17821-" 7 "chr20:57477802-57557196" "." "annotated" 0.000257 "clu11498+" 37 "chr14:22573674-23016447" "TRAV24" "cryptic" 0.000471 "clu504+" 18 "chr20:61444552-61444913" "." "annotated" 0.000532 "clu13587+" 3 "chr7:80290526-80293722" "CD36" "annotated" 0.000538 "clu10122-" 10 "chr11:62293215-62299267" "." "annotated" 0.000674 "clu847+" 5 "chr17:41158985-41165064" "IFI35" "cryptic" 0.000837 "clu14202-" 16 "chr12:109017303-109017664" "." "annotated" 0.000953 "clu3608-" 8 "chr22:23237766-23264978" "." "annotated" 0.000953 "clu18512+" 4 "chr11:32605475-32611093" "EIF3M" "annotated" 0.00193 "clu1175+" 8 "chr17:79413585-79476998" "." "annotated" 0.00233 "clu14676+" 5 "chr1:43159194-43162851" "YBX1" "cryptic" 0.00251 "clu11103-" 10 "chr14:106236323-106237006" "IGHG3" "cryptic" 0.00251 "clu18016-" 18 "chr2:70017058-70031641" "." "annotated" 0.00294 "clu11104-" 5 "chr14:106303493-106306703" "IGHD" "cryptic" 0.00427 "clu3852-" 28 "chr1:665184-675509" "." "annotated" 0.00446 "clu462+" 3 "chr20:48892272-48894028" "RP11-290F20.3" "cryptic" 0.00446 "clu9873?" 10 "chr12:125396517-125398114" "." "annotated" 0.00521 "clu1765-" 5 "chr17:62777661-62781340" "PLEKHM1P" "cryptic" 0.00523

jackhump commented 4 years ago

Dear QuanLG, apologies for the delay.

The "." here is due to none of the junctions in the cluster matching any known genes in your annotation.