davidaknowles / leafcutter

Annotation-free quantification of RNA splicing. Yang I. Li, David A. Knowles, Jack Humphrey, Alvaro N. Barbeira, Scott P. Dickinson, Hae Kyung Im, Jonathan K. Pritchard
http://davidaknowles.github.io/leafcutter/
Apache License 2.0
203 stars 113 forks source link

error while running leafcutter_cluster_regtools.py - "IndexError: list index out of range" #176

Open Tommeri opened 3 years ago

Tommeri commented 3 years ago

Hello everyone.

I am trying to run the script on 14 different junc files, constructed using regtools with no errors (using the option -s 1). the error is: NC_000022.11:+..NT_187689.1:-..NW_003315946.1:-..NC_000014.9:-..NW_009646199.1:-..Traceback (most recent call last): File "/home/be/tomm/bin/leafcutter/clustering/leafcutter_cluster_regtools.py", line 534, in main(options, libl) File "/home/be/tomm/bin/leafcutter/clustering/leafcutter_cluster_regtools.py", line 14, in main pool_junc_reads(libl, options) File "/home/be/tomm/bin/leafcutter/clustering/leafcutter_cluster_regtools.py", line 72, in pool_junc_reads clu = cluster_intervals(read_ks)[0] File "/home/be/tomm/bin/leafcutter/clustering/leafcutter_cluster_regtools.py", line 337, in cluster_intervals current = E[0] IndexError: list index out of range

Is there any fix to this problem yet? Thank you.

jackhump commented 3 years ago

it looks like your chromosomes aren't being recognised. What genome build are you using?

What happens when you include the --checkchrom flag?

Tommeri commented 3 years ago

it looks like your chromosomes aren't being recognised. What genome build are you using?

What happens when you include the --checkchrom flag?

Yes, that was the problem. My chromosome names were weird. I changed my genome build to your suggested GRCh38 one and it works ok. Thank you!

jaro-slamecka commented 3 years ago

Hi, I have the same issue. The --checkchrom flag took out the extra chromosomes but the error remained. I have tried mapping (STAR) with Gencode and Ensembl GRCh38 FASTA. Also, out of curiosity, what is the rationale for using --outSAMtype BAM Unsorted in the STAR run? Samtools will fail to index that, won't it? Thanks in advance for ideas!

goldenflaw commented 3 years ago

Could you check whether all your libraries have mapped successfully, with reads on each chromosome?

Best, Yang

On Thu, May 13, 2021 at 9:58 AM jaro-slamecka @.***> wrote:

Hi, I have the same issue. The --checkchrom flag took out the extra chromosomes but the error remained. I have tried mapping (STAR) with Gencode and Ensembl GRCh38 FASTA. Also, out of curiosity, what is the rationale for using --outSAMtype BAM Unsorted in the STAR run? Samtools will fail to index that, won't it? Thanks in advance for ideas!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/176#issuecomment-840618451, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCMHKHUDL6LOU5Z6R6DTNPSH7ANCNFSM4WGHCF2A .

jaro-slamecka commented 3 years ago

I checked with samtools idxstats and all libraries have reads on each chromosome

goldenflaw commented 3 years ago

Would you be able to post the entire error that you are encountering?

Best, Yang

On Thu, May 13, 2021 at 11:02 AM jaro-slamecka @.***> wrote:

I checked with samtools idxstats and all libraries have reads on each chromosome

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/176#issuecomment-840659834, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCMXG7CCGOL7IRDGYO3TNPZYTANCNFSM4WGHCF2A .

jaro-slamecka commented 3 years ago
Parsing...
X:+..20:+..13:+..20:?..Traceback (most recent call last):
  File "/home/[username]/leafcutter/clustering/leafcutter_cluster_regtools.py", line 534, in <module>
    main(options, libl)
  File "/home/[username]/leafcutter/clustering/leafcutter_cluster_regtools.py", line 14, in main
    pool_junc_reads(libl, options)
  File "/home/[username]/leafcutter/clustering/leafcutter_cluster_regtools.py", line 72, in pool_junc_reads
    clu = cluster_intervals(read_ks)[0]
  File "/home/[username]/leafcutter/clustering/leafcutter_cluster_regtools.py", line 337, in cluster_intervals
    current = E[0]
IndexError: list index out of range
goldenflaw commented 3 years ago

Are you running regtools correctly? There shouldn't be 20:? (which means that some reads have unassigned strands).

Best, Yang

On Thu, May 13, 2021 at 11:29 AM jaro-slamecka @.***> wrote:

Parsing... X:+..20:+..13:+..20:?..Traceback (most recent call last): File "/home/[username]/leafcutter/clustering/leafcutter_cluster_regtools.py", line 534, in main(options, libl) File "/home/[username]/leafcutter/clustering/leafcutter_cluster_regtools.py", line 14, in main pool_junc_reads(libl, options) File "/home/[username]/leafcutter/clustering/leafcutter_cluster_regtools.py", line 72, in pool_junc_reads clu = cluster_intervals(read_ks)[0] File "/home/[username]/leafcutter/clustering/leafcutter_cluster_regtools.py", line 337, in cluster_intervals current = E[0] IndexError: list index out of range

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/176#issuecomment-840676435, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCIC3GOWA2H7XDHCGN3TNP45ZANCNFSM4WGHCF2A .

jaro-slamecka commented 3 years ago

I am using the parameters from the vignette: regtools junctions extract -a 8 -m 50 -M 500000 $bamfile -o $bamfile.junc

output example:

Program:    regtools
Version:    0.5.2
Minimum junction anchor length: 8
Minimum intron length: 50
Maximum intron length: 500000
Alignment: /data/user/FASTQ/sample1-star-c.bam
Output file: /data/user/FASTQ/sample1-star-c.bam.junc
goldenflaw commented 3 years ago

If I recall correctly, regtools may have updated their options, see https://regtools.readthedocs.io/en/latest/commands/junctions-extract/. You could try to re-run using the newer options paying attention to the -s flag.

Best, Yang

On Thu, May 13, 2021 at 3:52 PM jaro-slamecka @.***> wrote:

I am using the parameters from the vignette: regtools junctions extract -a 8 -m 50 -M 500000 $bamfile -o $bamfile.junc

output example:

Program: regtools Version: 0.5.2 Minimum junction anchor length: 8 Minimum intron length: 50 Maximum intron length: 500000 Alignment: /data/user/FASTQ/sample1-star-c.bam Output file: /data/user/FASTQ/sample1-star-c.bam.junc

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/176#issuecomment-840826885, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCMDCRYNBTQOP55WB6LTNQ3XHANCNFSM4WGHCF2A .

jaro-slamecka commented 3 years ago

Just looked into the script again and actually, that's what I used, I think it threw an error the first time so I have had the -s flag there all along. ~/regtools/build/regtools junctions extract -a 8 -m 50 -M 500000 -s 2 sample1-star-c.bam -o sample1-star-c.bam.junc Do you have any recommendation on how to address the reads with unassigned strand, if that's indeed the issue?

goldenflaw commented 3 years ago

Throw them out.

Best, Yang

On Thu, May 13, 2021, 18:40 jaro-slamecka @.***> wrote:

Just looked into the script again and actually, that's what I used, I think it threw an error the first time so I have had the -s flag there all along. ~/regtools/build/regtools junctions extract -a 8 -m 50 -M 500000 -s 2 sample1-star-c.bam -o sample1-star-c.bam.junc Do you have any recommendation on how to address the reads with unassigned strand, if that's indeed the issue?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/176#issuecomment-840896054, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCM6EUHHZPVRFOKWYILTNRPOFANCNFSM4WGHCF2A .

jaro-slamecka commented 3 years ago

That took care of it. There were indeed several lines with "?" in column 6 of the junc files:

cat sample1-bam.junc | awk '{print $6}' | sort | uniq

-
?
+

So I deleted those lines:

while read JUNC; do
awk '$6 != "?"' $JUNC > ${JUNC/junc/fixed.junc}
done < <( find ./ -type f -name "*junc" )

Now onto the next steps... Thanks so much for your help! Jaro

yingji15 commented 3 years ago

I run into the same problems, so I threw out those lines with "?" as suggested by Jaro

for juncfile in `ls 19078-09/*junc`; do
    echo $juncfile 
    awk '$6 != "?"' $juncfile > $juncfile.fixed.junc
    echo $juncfile.fixed.junc >> test_juncfiles.txt
done

after this I still have the same problems without '--checkchrom'

Parsing... GL000205.2:-..GL000218.1:-..15:-..3:+..GL000221.1:+..21:+..14:-..KI270727.1:-..GL000220.1:-..16:+..10:+..KI270707.1:-..Traceback (most recent call last): File "/bin/leafcutter/clustering/leafcutter_cluster_regtools.py", line 534, in main(options, libl) File "/bin/leafcutter/clustering/leafcutter_cluster_regtools.py", line 14, in main pool_junc_reads(libl, options) File "/bin/leafcutter/clustering/leafcutter_cluster_regtools.py", line 72, in pool_junc_reads clu = cluster_intervals(read_ks)[0] File "/bin/leafcutter/clustering/leafcutter_cluster_regtools.py", line 337, in cluster_intervals current = E[0]

but after adding "--checkchrom" it works

python2 /leafcutter/clustering/leafcutter_cluster_regtools.py -j test_juncfiles.txt -m 50 -o 19078_09 -l 500000 --checkchrom

Thanks!

goldenflaw commented 3 years ago

Try using the -chrom flag or removing GL/KI chromosomes.

Best, Yang

On Thu, Jun 10, 2021 at 2:12 PM Ying @.***> wrote:

I run into the same problems, and after I throw out those lines with "?", I still have the same problems

Parsing... GL000205.2:-..GL000218.1:-..15:-..3:+..GL000221.1:+..21:+..14:-..KI270727.1:-..GL000220.1:-..16:+..10:+..KI270707.1:-..Traceback (most recent call last): File "/bin/leafcutter/clustering/leafcutter_cluster_regtools.py", line 534, in main(options, libl) File "/bin/leafcutter/clustering/leafcutter_cluster_regtools.py", line 14, in main pool_junc_reads(libl, options) File "/bin/leafcutter/clustering/leafcutter_cluster_regtools.py", line 72, in pool_junc_reads clu = cluster_intervals(read_ks)[0] File "/bin/leafcutter/clustering/leafcutter_cluster_regtools.py", line 337, in cluster_intervals current = E[0]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/176#issuecomment-858934295, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCJPC5Z3MZFA7FYFAVLTSEFAHANCNFSM4WGHCF2A .

yingji15 commented 3 years ago

Thanks, Yang! The -checkchrom flag works! And thanks for pointing me to GL/KL, I did have some of those on the first column of my junc files.

bsouthey commented 3 years ago

Regarding the 'leafcutter_cluster_regtools.py' error, this due to an empty list resulting from requiring 3 reads. It has nothing to do with the chromosomes. Trivial fix provided in either of these issues: Error if the list read_ks (related to chromosome identifier) is empty #183 list index out of range #184

yingji15 commented 3 years ago

Regarding the 'leafcutter_cluster_regtools.py' error, this due to an empty list resulting from requiring 3 reads. It has nothing to do with the chromosomes. Trivial fix provided in either of these issues: Error if the list read_ks (related to chromosome identifier) is empty #183 list index out of range #184

Thanks for this information! I wonder why does 'checkchrom' solve the problem? Is it because well formated chromosomes less likely to have <3 reads?