davidaknowles / leafcutter

Annotation-free quantification of RNA splicing. Yang I. Li, David A. Knowles, Jack Humphrey, Alvaro N. Barbeira, Scott P. Dickinson, Hae Kyung Im, Jonathan K. Pritchard
http://davidaknowles.github.io/leafcutter/
Apache License 2.0
203 stars 113 forks source link

leafcutter_cluster.py error while executing #153

Closed hothri closed 4 years ago

hothri commented 4 years ago

Hi, I am getting below error while trying to run leafcutter_cluster_regtools.py with the test data. File "/data/leafcutter_cluster_regtools.py", line 49 print ln ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print(print ln)?

Thank you, Hothri

jackhump commented 4 years ago

Hi Hothri,

I'm afraid that all Python scripts in Leafcutter are still in python2.

hothri commented 4 years ago

Hi Thank you for replying. When I am using python2.7.15 I am getting below error.

scanning /data//RNA.NA06984_CEU.chr1.bam.junc... Traceback (most recent call last): File "/data//leafcutter_cluster_regtools.py", line 497, in main(options, libl) File "/data//leafcutter_cluster_regtools.py", line 14, in main pool_junc_reads(libl, options) File "/data/leafcutter_cluster_regtools.py", line 47, in pool_junc_reads chrom, A, B, dot, counts, strand, rA,rb, rgb, blockCount, blockSize, blockStarts = lnsplit ValueError: need more than 6 values to unpack leafcutter_cluster.o1236120 (END)

jackhump commented 4 years ago

hmm, are you using regtools-derived junction files for leafcutter_cluster_regtools.py?

hothri commented 4 years ago

I am using leafcutter_cluster_regtools.py

jackhump commented 4 years ago

yes I can see that from your error message, but how did you generate the /RNA.NA06984_CEU.chr1.bam.junc file? That error will occur if the junc files are not from Regtools.

hothri commented 4 years ago

I used bam2junc.sh to generate .junc files

jackhump commented 4 years ago

leafcutter_cluster_regtools.py only works with junctions created by regtools.

This is our fault for not being clear in the documentation. @davidaknowles this is something we really need to sort out!

hothri commented 4 years ago

Could you please point me to the current regtools scripts. I found these scripts from the script folder.

jackhump commented 4 years ago

see here: https://github.com/davidaknowles/leafcutter/blob/54235a4c5e5946c926f1d0f869db56eb87f78e42/example_data/worked_example.sh

hothri commented 4 years ago

Hi, This time I followed the correct instructions and I am using the right tools. I am getting a new error now. chr11:-..chr6_ssto_hap7:-..chr10:-..chr17_gl000205_random:-..Traceback (most recent call last): File "/data/WHRI-GenomeCentre/leafcutter_tool/leafcutter-hpc/clustering/leafcutter_cluster_regtools.py", line 534, in <module> main(options, libl) File "/data/WHRI-GenomeCentre/leafcutter_tool/leafcutter-hpc/clustering/leafcutter_cluster_regtools.py", line 14, in main pool_junc_reads(libl, options) File "/data/WHRI-GenomeCentre/leafcutter_tool/leafcutter-hpc/clustering/leafcutter_cluster_regtools.py", line 72, in pool_junc_reads clu = cluster_intervals(read_ks)[0] File "/data/WHRI-GenomeCentre/leafcutter_tool/leafcutter-hpc/clustering/leafcutter_cluster_regtools.py", line 337, in cluster_intervals current = E[0]

Thank you, Hothri

jackhump commented 4 years ago

It looks like you have non-standard chromosomes in there, please add the --checkchrom flag to your leafcutter_cluster_regtools.py command.

If that still doesn't work then consider removing any chromosome that doesn't follow chr1, chr2,chr3, etc format.

hothri commented 4 years ago

Hi, After using --checkchrom I am getting IndexError: list index out of range. I am using STAR v2.5.3a and using hg19 reference. STAR --genomeDir ${GENOME} \ --readFilesCommand gunzip -c \ --readFilesIn "$mySampName"_R1.fastq.gz \ --sjdbGTFfile ${GENOME}/gencode.v19.annotation.gtf \ --outSAMstrandField intronMotif \ --outSAMtype BAM SortedByCoordinate \ --outFileNamePrefix ${mySampName%.fastq} \ --twopassMode Basic \ --runThreadN ${NSLOTS}

jackhump commented 4 years ago

Can you show us the error message from leafcutter_cluster_regtools.py?

hothri commented 4 years ago

leaf_cutter_error.txt Please find the error report file

jackhump commented 4 years ago

ok, firstly remove any files that are not .junc files from your test_juncfiles.txt file.

Then can you check through one of your junc files to see what chromosomes are being called? the --checkchrom flag should take care of this, but the IndexError: list index out of range error you're seeing is normally caused by a non-standard chromosome name in the .junc file.

Another possibility is the stranding of the junctions. Are the RNA-seq libraries stranded or unstranded? You can specify this when you create your junctions using the -s flag in regtools junctions extract - see here: https://regtools.readthedocs.io/en/latest/commands/junctions-extract/

hothri commented 4 years ago

Hi, There are no other files in test_juncfiles.txt. and RNA seq libraries are stranded. I am using below command to extract .junc files. same as you mentioned in your worked_example.sh script regtools junctions extract -a 8 -s 1 -m 50 -M 500000 "$bamfile" -o "$bamfile".junc

jackhump commented 4 years ago

try -s 2 and -s 0 to see if you get the same error.

hothri commented 4 years ago

I tried both the options and I am getting the same error.

jackhump commented 4 years ago

can you tell me what chromosome names are in your junc files?

Something like:

awk '{print $1}' file.junc | sort | uniq 

if there's anything there that doesn't follow either 1,2,3,4 or chr1,chr2,chr3, try removing those lines from the file.

hothri commented 4 years ago

Chr.txt Please find the attached text file. I dont see any such lines

jackhump commented 4 years ago

This is what I mean: "chr17_ctg5_hap1", "chr17_gl000205_random", etc.

Please remove all lines from the junction file where the chromosome is not chr1,chr2,... chrX,chrY,chrM.

jackhump commented 4 years ago

Hi hothri,

I had a deeper look into this with another user and I believe your error may be due to the stranding parameter of regtools junction extract.

Can you check for me whether any of the lines in your junc files contain "?" in column 6? I believe these ambiguously stranded junctions are the issue.

Was your RNA-seq library strand-specific? If so you can specify this in the regtools command with -s 1 or -s 2 . Here, 1 = first-strand/RF, 2 = second-strand/FR.

I've just made a pull request to fix leafcutter_cluster_regtools so that the ambiguously stranded junctions are ignored by default.

Apologies for taking so long.

hothri commented 4 years ago

Hi Jack,

Thank you for getting back. Yes, I can see "?" in column 6. RNA-seq libraries are stranded so I used -s 1.
junc_col6.txt

hothri commented 4 years ago

Hi, I re done the analysis with leafcutter_cluster_regtools_py3.py and it processed .junc files successfully. Now I am getting a different error during differential splicing analysis. I am copying error message below.

Running differential splicing analysis... Error in differential_splicing(counts, numeric_x, confounders = confounders, : unused arguments (init = opt$init, seed = opt$seed) Execution halted Activating leafcutter conda environment..... activate does not accept more than one argument: ['/data/WHRI-GenomeCentre/leafcutter_tool/leafcutter-New-Script/scripts/ds_plots.R', '-e', '/data/WHRI-GenomeCentre/hmoka/Project_STARvsKallisto/leafcutter_test/Morris_test/gencode.v35.exons.txt', '/data/WHRI-GenomeCentre/hmoka/Project_STARvsKallisto/leafcutter_test/Morris_test/NormalvsTumor_perind_numers.counts.gz', '/data/WHRI-GenomeCentre/hmoka/Project_STARvsKallisto/leafcutter_test/FASTQ/groups_file.txt', 'leafcutter_ds_cluster_significance.txt', '-f', '0.05']

jackhump commented 4 years ago

can you show me the script you used to run that? It looks like an error in the arguments you're using.

hothri commented 4 years ago

Hi, Please find my script below. `module load singularity/3.5.3

IMG=/data/containers/leafcutter/leafcutter_0.2.simg

for bamfile in "$FASTQ"/*.bam; do if [ ! -f "$bamfile".junc ]; then echo "Converting $bamfile to $bamfile.junc" $IMG samtools index "$bamfile" $IMG regtools junctions extract -a 8 -s 1 -m 50 -M 500000 "$bamfile" -o "$bamfile".junc else echo "Skipping $bamfile" fi done

if [ -e test_juncfiles.txt ]; then rm test_juncfiles.txt; fi for bamfile in "$FASTQ"/*.bam; do if [ -f "$bamfile".junc ]; then echo "$bamfile".junc >> test_juncfiles.txt fi done

Finds intron clusters and quantifies junction usage within them.

$IMG python $TOOL_DIR/scripts/leafcutter_cluster_regtools_py3.py -j test_juncfiles.txt -m 50 -o NormalvsTumor -l 500000

Differential splicing analysis.

$IMG $TOOL_DIR/scripts/leafcutter_ds.R --num_threads 4 $PROJECTDIR/NormalvsTumor_perind_numers.counts.gz $FASTQ/groups_file.txt

Plot pdf differentially spliced clusters.

$IMG $TOOL_DIR/scripts/ds_plots.R -e $PROJECTDIR/gencode.v35.exons.txt $PROJECTDIR/NormalvsTumor_perind_numers.counts.gz $FASTQ/groups_file.txt leafcutter_ds_cluster_significance.txt -f 0.05

Leafviz visualization shiny app

pushd /data//leafcutter_tool/leafcutter-hpc/leafviz/

Download hg19 annotation files

./data//leafcutter_tool/leafcutter-hpc/leafviz/download_human_annotation_codes.sh

Cache results for viewing

$IMG ./prepare_results.R --meta_data_file $FASTQ/groups_file.txt \ --code leafcutter $PROJECTDIR/NormalvsTumor_perind_numers.counts.gz \ $PROJECTDIR/leafcutter_ds_cluster_significance.txt \ $PROJECTDIR/leafcutter_ds_effect_sizes.txt \ annotation_codes/gencode_hg19/gencode_hg19 \ -o $PROJECTDIR/NormalvsTumor.RData popd `

jackhump commented 4 years ago

please just run the "differential splicing analysis" line, not the entire script, and report what happens.

hothri commented 4 years ago

Please find the copied `[hhy055@sdv47 Morris_test]$ $IMG /data/WHRI-GenomeCentre/leafcutter_tool/leafcutter-New-Script/scripts/leafcutter_ds.R NormalvsTumor_perind_numers.counts.gz /data/WHRI-GenomeCentre/hmoka/Project_STARvsKallisto/leafcutter_test/FASTQ/groups_file.txt -e gencode.v35.exons.txt Activating leafcutter conda environment..... activate does not accept more than one argument: ['/data/WHRI-GenomeCentre/leafcutter_tool/leafcutter-New-Script/scripts/leafcutter_ds.R', 'NormalvsTumor_perind_numers.counts.gz', '/data/WHRI-GenomeCentre/hmoka/Project_STARvsKallisto/leafcutter_test/FASTQ/groups_file.txt', '-e', 'gencode.v35.exons.txt']

Warning message: package 'optparse' was built under R version 3.4.1 Loading required package: Rcpp Warning messages: 1: package 'leafcutter' was built under R version 3.4.1 2: replacing previous import 'dplyr::combine' by 'gridExtra::combine' when loading 'leafcutter' 3: replacing previous import 'dplyr::filter' by 'stats::filter' when loading 'leafcutter' 4: replacing previous import 'dplyr::lag' by 'stats::lag' when loading 'leafcutter' Loading counts from NormalvsTumor_perind_numers.counts.gz Loading metadata from /data/WHRI-GenomeCentre/hmoka/Project_STARvsKallisto/leafcutter_test/FASTQ/groups_file.txt Encoding as Normal =0, Tumor =1 Loading required package: doMC Loading required package: foreach Loading required package: iterators Loading required package: parallel Settings: $output_prefix [1] "leafcutter_ds"

$max_cluster_size [1] Inf

$min_samples_per_intron [1] 5

$min_samples_per_group [1] 3

$min_coverage [1] 20

$timeout [1] 30

$num_threads [1] 1

$exon_file [1] "gencode.v35.exons.txt"

$init [1] "smart"

$seed [1] 12345

$help [1] FALSE

Running differential splicing analysis... Error in differential_splicing(counts, numeric_x, confounders = confounders, : unused arguments (init = opt$init, seed = opt$seed) Execution halted `

jackhump commented 4 years ago

Ah I see, this is actually an installation issue due to the Stan dependencies - see #87 .

How did you install the leafcutter R package? Our current fix for installation difficulties is the following R code:

install.packages("remotes")
remotes::install_github("stan-dev/rstantools")
remotes::install_github("davidaknowles/leafcutter/leafcutter", ref = "psi_2019")
hothri commented 4 years ago

Hi, You can see the full installation recipe here: https://github.com/davidaknowles/leafcutter/issues/160

hothri commented 4 years ago

Hi, I made the required changes and its working now. Thank you for being patient and helping me.

jackhump commented 4 years ago

That's good to hear! What did you do to solve the problem?

hothri commented 4 years ago

my colleague who created leafcutter singularity altered the call to differential_splicing() so that the opt$init and opt$seed arguments weren't used.

Aynur31 commented 3 years ago

Hello , Please help me. I am having the same issue, and followed your advice , but could not get it solved.

Aynur31 commented 3 years ago

That's good to hear! What did you do to solve the problem?

Hello Jack, Can you please help me? I am having the same problem. I am following this tutorial Differential Splicing Leafcutter for bamfile inls $outdir/CON-2Aligned.sortedByCoord.out.bam; do echo Converting $bamfile to $bamfile.junc samtools index $bamfile /kuacc/users/akashgari19/apps/regtools/build/regtools junctions extract -a 8 -m 50 -M 500000 -s 1 $bamfile -o $bamfile. junc echo $bamfile.junc >> test_juncfiles.txt done After geting .junc files, I tried to cluster introns using the following code, but I got error messages (included below) $ python /kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py -j test_juncfiles.txt -m 50 -o testYRIvsEU -l 500000 scanning /kuacc/users/akashgari19/CutterLeaf_STAR_Mapping/con2_mapping/CON-2Aligned.sortedByCoord.out.bam.junc... Parsing... X:+..17:+..17:-..9:?..16:-..1:?..Y:-..18:+..19:?..Traceback (most recent call last): File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 497, in <module> main(options, libl) File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 14, in main pool_junc_reads(libl, options) File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 68, in pool_junc_reads clu = cluster_intervals(read_ks)[0] File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 316, in cluster_intervals current = E[0] IndexError: list index out of range Here is my .junc file: head CON-2Aligned.sortedByCoord.out.bam.junc 1 4495058 4676205 JUNC00000001 2 - 4495058 4676205 255,0,0 2 18,54 0,181093 1 4676151 4914997 JUNC00000002 1 - 4676151 4914997 255,0,0 2 54,28 0,238818 1 4676151 4977145 JUNC00000003 1 - 4676151 4977145 255,0,0 2 54,28 0,300966 1 4773277 4774364 JUNC00000004 1 - 4773277 4774364 255,0,0 2 59,42 0,1045 1 4774430 4777602 JUNC00000005 6 - 4774430 4777602 255,0,0 2 86,78 0,3094 1 4776703 4777622 JUNC00000006 324 - 4776703 4777622 255,0,0 2 98,98 0,821 1 4777551 4782665 JUNC00000007 249 - 4777551 4782665 255,0,0 2 97,98 0,5016 1 4777564 4778798 JUNC00000008 3 - 4777564 4778798 255,0,0 2 84,51 0,1183 1 4778747 4782629 JUNC00000009 5 - 4778747 4782629 255,0,0 2 51,62 0,3820 1 4782211 4782610 JUNC00000010 3 - 4782211 4782610 255,0,0 2 90,43 0,356

Can you please help me ? Thank you very much.

Aynur31 commented 3 years ago

That's good to hear! What did you do to solve the problem?

Hello Jack, Can you please help me? I am having the same problem. I am following this tutorial Differential Splicing Leafcutter for bamfile inls $outdir/CON-2Aligned.sortedByCoord.out.bam; do echo Converting $bamfile to $bamfile.junc samtools index $bamfile /kuacc/users/akashgari19/apps/regtools/build/regtools junctions extract -a 8 -m 50 -M 500000 -s 1 $bamfile -o $bamfile. junc echo $bamfile.junc >> test_juncfiles.txt done After geting .junc files, I tried to cluster introns using the following code, but I got error messages (included below) $ python /kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py -j test_juncfiles.txt -m 50 -o testYRIvsEU -l 500000 scanning /kuacc/users/akashgari19/CutterLeaf_STAR_Mapping/con2_mapping/CON-2Aligned.sortedByCoord.out.bam.junc... Parsing... X:+..17:+..17:-..9:?..16:-..1:?..Y:-..18:+..19:?..Traceback (most recent call last): File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 497, in <module> main(options, libl) File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 14, in main pool_junc_reads(libl, options) File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 68, in pool_junc_reads clu = cluster_intervals(read_ks)[0] File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 316, in cluster_intervals current = E[0] IndexError: list index out of range Here is my .junc file: head CON-2Aligned.sortedByCoord.out.bam.junc 1 4495058 4676205 JUNC00000001 2 - 4495058 4676205 255,0,0 2 18,54 0,181093 1 4676151 4914997 JUNC00000002 1 - 4676151 4914997 255,0,0 2 54,28 0,238818 1 4676151 4977145 JUNC00000003 1 - 4676151 4977145 255,0,0 2 54,28 0,300966 1 4773277 4774364 JUNC00000004 1 - 4773277 4774364 255,0,0 2 59,42 0,1045 1 4774430 4777602 JUNC00000005 6 - 4774430 4777602 255,0,0 2 86,78 0,3094 1 4776703 4777622 JUNC00000006 324 - 4776703 4777622 255,0,0 2 98,98 0,821 1 4777551 4782665 JUNC00000007 249 - 4777551 4782665 255,0,0 2 97,98 0,5016 1 4777564 4778798 JUNC00000008 3 - 4777564 4778798 255,0,0 2 84,51 0,1183 1 4778747 4782629 JUNC00000009 5 - 4778747 4782629 255,0,0 2 51,62 0,3820 1 4782211 4782610 JUNC00000010 3 - 4782211 4782610 255,0,0 2 90,43 0,356

Can you please help me ? Thank you very much. I also followed your advice to other people, and here are my chromosomes. awk '{print $1}' *.junc | sort | uniq 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9 GL456210.1 GL456211.1 GL456216.1 GL456233.1 GL456370.1 GL456379.1 GL456382.1 JH584295.1 JH584299.1 JH584304.1 MT X Y

jackhump commented 3 years ago

Hi Aynur,

did you try adding --checkchrom to your python /kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py command?

As I've stated in a few of these issues, there's a current bug (that @davidaknowles needs to fix) where the clustering script doesn't like the presence of "?" in the strand column of the junction files.

So as a workaround try removing all lines containing "?" and rerun clustering with a junction list containing the paths to these new files.

grep -v "?" CON-2Aligned.sortedByCoord.out.bam.junc > /CON-2Aligned.sortedByCoord.out.bam.filtered.junc

The other potential cause is that you only have 1 file in your junction list. I'm not sure if clustering will work with only 1 file but I could be wrong.

Aynur31 commented 3 years ago

--checkchrom I added the flag like this and I got the same error. But when I ran the same command on one of my bam files, it worked witout error message. $ python /kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py -j test_juncfiles.txt -m 50 -o --checkchrom testYRIvsEU -l 500000 scanning /kuacc/users/akashgari19/CutterLeaf_STAR_Mapping/con2_mapping/CON-2Aligned.sortedByCoord.out.bam.junc... Parsing... X:+..17:+..17:-..9:?..16:-..1:?..Y:-..18:+..19:?..Traceback (most recent call last): File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 497, in main(options, libl) File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 14, in main pool_junc_reads(libl, options) File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 68, in pool_junc_reads clu = cluster_intervals(read_ks)[0] File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 316, in cluster_intervals current = E[0] IndexError: list index out of range After doing this grep -v "?" CON-2Aligned.sortedByCoord.out.bam.junc > /CON-2Aligned.sortedByCoord.out.bam.filtered.junc , I got the same error message.

jackhump commented 3 years ago

did you update the file path in test_juncfiles.txt ?

Aynur31 commented 3 years ago

--checkchrom I added the flag like this and I got the same error. But when I ran the same command on one of my bam files, it worked witout error message. $ python /kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py -j test_juncfiles.txt -m 50 -o --checkchrom testYRIvsEU -l 500000 scanning /kuacc/users/akashgari19/CutterLeaf_STAR_Mapping/con2_mapping/CON-2Aligned.sortedByCoord.out.bam.junc... Parsing... X:+..17:+..17:-..9:?..16:-..1:?..Y:-..18:+..19:?..Traceback (most recent call last): File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 497, in main(options, libl) File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 14, in main pool_junc_reads(libl, options) File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 68, in pool_junc_reads clu = cluster_intervals(read_ks)[0] File "/kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py", line 316, in cluster_intervals current = E[0] IndexError: list index out of range After doing this grep -v "?" CON-2Aligned.sortedByCoord.out.bam.junc > /CON-2Aligned.sortedByCoord.out.bam.filtered.junc , I got the same error message.

Below is the file where I got no error messages. I do not find any differences in the command I used , and they are biological replicates. CON-1Aligned.sortedByCoord.out.bam.junc CON-1_STARpass1 ak19-2105859.out CON-1Log.final.out test_juncfiles.txt ak19-2106013.out CON-1Log.out testYRIvsEU_perind.counts.gz ak19-2106014.out CON-1Log.progress.out testYRIvsEU_perind_numers.counts.gz Con1_2pass.sh Con1_regtool.sh testYRIvsEU_pooled CON-1Aligned.sortedByCoord.out.bam CON-1SJ.out.tab testYRIvsEU_refined CON-1Aligned.sortedByCoord.out.bam.bai CON-1_STARgenome testYRIvsEU_sortedlibs [akashgari19@login03 con1_mapping]$

jackhump commented 3 years ago

I'm not sure what you mean here. The point of clustering in Leafcutter is to combine the Regtools junctions from multiple samples into a single matrix.

Your file test_juncfiles.txt should contain the paths to both files, after filtering "?" from the junctions using the code above.

Aynur31 commented 3 years ago

--checkchrom After updating test_juncfiles.txt , and adding --checkchrom , it worked for this sample, and I will try with other files. Thank you very much. I appreciate the great guide. python /kuacc/users/akashgari19/apps/leafcutter/scripts/leafcutter_cluster_regtools.py -j test_juncfiles.txt -m 50 --checkchrom -o testYRIvsEU -l 500000 scanning /kuacc/users/akashgari19/CutterLeaf_STAR_Mapping/con2_mapping/CON-2Aligned.sortedByCoord.out.bam.filtered.junc... Parsing... X:+..16:-..18:+..15:-..13:-..Y:-..6:-..1:-..8:+..5:-..19:-..12:+..17:+..X:-..Y:+..14:+..16:+..9:-..10:+..4:+..2:+..3:+..18:-..15:+..6:+..8:-..11:-..5:+..17:-..13:+..4:-..3:-..7:+..12:-..1:+..10:-..7:-..19:+..9:+..2:-..14:-..11:+.. Wrote 123578 clusters.. Split into 1184 clusters... Sorting /kuacc/users/akashgari19/CutterLeaf_STAR_Mapping/con2_mapping/CON-2Aligned.sortedByCoord.out.bam.filtered.junc.. merging 1 junction files...

Aynur31 commented 3 years ago

I'm not sure what you mean here. The point of clustering in Leafcutter is to combine the Regtools junctions from multiple samples into a single matrix.

Your file test_juncfiles.txt should contain the paths to both files, after filtering "?" from the junctions using the code above.

I am sorry for the confusion. I meant , I did not have any problem running intron clustering for one of my samples, thats why I felt weird. But now after removing "?", it is working.

jackhump commented 3 years ago

great to hear. I'm sorry this hasn't been fixed yet, I keep asking @davidaknowles to apply my pull request.

skalava23 commented 3 years ago

@jackhump I am running the code to convert the example data bam files to junc via regtools, but I keep getting this parsing error:

This is what keeps popping up.

Converting run/geuvadis/RNA.NA06985_CEU.chr1.bam to run/geuvadis/RNA.NA06985_CEU.chr1.bam.junc

Program: regtools Version: 0.5.2 Usage: regtools junctions extract [options] indexed_alignments.bam Options: -a INT Minimum anchor length. Junctions which satisfy a minimum anchor length on both sides are reported. [8] -m INT Minimum intron length. [70] -M INT Maximum intron length. [500000] -o FILE The file to write output to. [STDOUT] -r STR The region to identify junctions in "chr:start-end" format. Entire BAM by default. -s INT Strand specificity of RNA library preparation (0 = unstranded, 1 = first-strand/RF, 2, = second-strand/FR). REQUIRED -t STR Tag used in bam to label strand. [XS]

Error parsing inputs!(2)

The samtools command is working and converting the bam to bam.bai files, but there is some error in parsing the inputs for the regtools. I am not seeing any junc files. I am still using the example data. I would appreciate any insight! Thank you

worldsofpivot commented 2 years ago

every line of my .junc file (from the example data) has a "?" how should I go about executing the leafcutter_cluster.py in this case because I cannot simply remove all lines with a "?"