brentp / bwa-meth

fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome
https://arxiv.org/abs/1401.1129
MIT License
141 stars 54 forks source link

UnboundLocalError: local variable 'toks' referenced before assignment #6

Closed superbobry closed 10 years ago

superbobry commented 10 years ago

Hello, the master version seems to be broken after 1eaa967de500f12b617847490f248c439f561093.

Can you please take a look at bwameth.py:291?

superbobry commented 10 years ago

Actually, this anomaly is related to the behaviour of nopen, discused in #7.

brentp commented 10 years ago

can you update to the latest version of toolshed and let me know if you still see it?

superbobry commented 10 years ago

I've update to the latest toolshed 0.3.7. Now I have a different trace, even more like #7:

$  bwameth.py --reference /mnt/stripe/b
wa-indexes/hg18.fa --threads 15 --prefix srx_tmp_hg18 ./SRR019600.fastq
running: bwa mem -T 40 -B 2 -L 10 -CM -R '@RG   ID:bm   SM:bm' -t 15  /mnt/stripe/bwa-indexes/hg18.fa
.bwameth.c2t '</home/user/.virtualenvs/env/bin/python /home/user/.virtualenvs/env/src/bwameth-head/bw
ameth.py c2t ./SRR019600.fastq NA'
writing to:
samtools view -bS - | samtools sort - srx_tmp_hg18
converting reads in ./SRR019600.fastq,NA
WARNING: running bwameth in single-end mode
[M::main_mem] read 1973686 sequences (150000136 bp)...
[M::mem_process_seqs] Processed 1973686 reads in 1042.692 CPU sec, 69.759 real sec
Traceback (most recent call last):
  File "/home/user/.virtualenvs/env/bin/bwameth.py", line 10, in <module>
    execfile(__file__)
  File "/home/user/.virtualenvs/env/src/bwameth-head/bwameth.py", line 596, in <module>
    main(sys.argv[1:])
  File "/home/user/.virtualenvs/env/src/bwameth-head/bwameth.py", line 581, in main
    set_as_failed=args.set_as_failed)
  File "/home/user/.virtualenvs/env/src/bwameth-head/bwameth.py", line 259, in bwa_mem
    as_bam(cmd, fa, prefix, calmd, set_as_failed)
  File "/home/user/.virtualenvs/env/src/bwameth-head/bwameth.py", line 296, in as_bam
    out.write(str(aln) + '\n')
IOError: [Errno 32] Broken pipe
[fputs] Broken pipe
Traceback (most recent call last):
  File "/home/user/.virtualenvs/env/src/bwameth-head/bwameth.py", line 596, in <module>
    main(sys.argv[1:])
  File "/home/user/.virtualenvs/env/src/bwameth-head/bwameth.py", line 546, in main
    sys.exit(convert_reads(args[1], args[2]))
  File "/home/user/.virtualenvs/env/src/bwameth-head/bwameth.py", line 102, in convert_reads
    out.write("".join((name, seq, "\n+\n", qual)))
IOError: [Errno 32] Broken pipe

Is there a way to debug this?

brentp commented 10 years ago

please see the latest commit.

superbobry commented 10 years ago

Thank, but surprisingly the output remained the same. Probably because the error actually happens in the out.write part. Any hint on how should I check my reads?

brentp commented 10 years ago

maybe if you check the partial bam output, you can see about where it errored and then send a subset of reads to bwa-meth.

What does

tail ./SRR019600.fastq

show?

brentp commented 10 years ago

I downloaded that fastq and am able to reproduce what you are seeing. Will post fix shortly.

superbobry commented 10 years ago

Bellow is the tail output. You probably have it already:

$ tail SRR019600.fastq
+SRR019600.7351165 EVANSGA_9:1:100:1792:1500 length=76
:0IIIIIII3E4I6IIIIIIII=III46IIIIII*IIIIIIIIII5II6II:IIIIIIF4A3DI>I%.;I$G=#0@
@SRR019600.7351166 EVANSGA_9:1:100:1792:1526 length=76
AGTTAATGGTAGAATTGAGATTAGAATTAATGATTAAGTTAGTGTAATAAATTGGTAATAGAGGATAAGTAAAATT
+SRR019600.7351166 EVANSGA_9:1:100:1792:1526 length=76
(I(IIIIIIBDIEIIIIIIIIIIIIIBBIHII=<?III+IIF1IIIIBIDGIIIH&?GII9III:@50<,G162>B
@SRR019600.7351167 EVANSGA_9:1:100:1792:1537 length=76
TGGGGAATAAAAGATTTTCATATTACGTTTTTTTGTATGTGATGTTTGTATGTAGAATTTATGGATTATTTTTTAT
+SRR019600.7351167 EVANSGA_9:1:100:1792:1537 length=76
IIIIIIIII;6GIH>IIII>IIIII=@7?IIIDFIDIIIDH0II.EII9I>I%BI61IIIGIII-3C6I3D7I6+;

The resulting BAM file seem to have only the header and no alignments.

brentp commented 10 years ago

this was occuring because I was using the csv module in reader which iterates over the SAM output. If the quality field started with ", it would mess things up. It requires an update to toolshed as well.

superbobry commented 10 years ago

It did help, thank you!

superbobry commented 10 years ago

Hmm, there're no exceptions but the whole process seems to be hanging after a certain point. Here's the log:

$ bwameth.py --reference /mnt/stripe/bwa-indexes/hg18.fa --threads 15 --prefix srx_tmp_hg18 ./SRR019600.fastq
running: bwa mem -T 40 -B 2 -L 10 -CM -R '@RG   ID:bm   SM:bm' -t 15  /mnt/stripe/bwa-indexes/hg18.fa.bwameth.c2t '</home/user/.virtualenvs/env/bin/python /home/user/.virtualenvs/env/src/bwameth-head/bwam
eth.py c2t ./SRR019600.fastq NA'
writing to:
samtools view -bS - | samtools sort - srx_tmp_hg18
converting reads in ./SRR019600.fastq,NA
WARNING: running bwameth in single-end mode
converting reads in ./SRR019600.fastq,NA
WARNING: running bwameth in single-end mode
[M::main_mem] read 1973686 sequences (150000136 bp)...
[M::main_mem] read 1973686 sequences (150000136 bp)...
[M::mem_process_seqs] Processed 1973686 reads in 1255.657 CPU sec, 117.232 real sec
[M::mem_process_seqs] Processed 1973686 reads in 1251.457 CPU sec, 119.854 real sec
[M::main_mem] read 1973686 sequences (150000136 bp)...
[M::mem_process_seqs] Processed 1973686 reads in 1099.928 CPU sec, 73.542 real sec
[M::main_mem] read 1973686 sequences (150000136 bp)...
[M::mem_process_seqs] Processed 1973686 reads in 1027.303 CPU sec, 68.685 real sec
WARNING: 7351167 reads with length < 80
       : this program is designed for long reads
[M::main_mem] read 1430109 sequences (108688284 bp)...
[M::mem_process_seqs] Processed 1430109 reads in 763.489 CPU sec, 51.065 real sec
[main] Version: 0.7.9a-r786
[main] CMD: bwa mem -T 40 -B 2 -L 10 -CM -R @RG ID:bm   SM:bm -t 15 /mnt/stripe/bwa-indexes/hg18.fa.bwameth.c2t </home/user/.virtualenvs/env/bin/python /home/user/.virtualenvs/env/src/bwameth-head/bwameth
.py c2t ./SRR019600.fastq NA
[main] Real time: 894.556 sec; CPU: 4167.274 sec

Also, duplicated messages look suspicious.

Update: it seems like bwa-meth is starting two bwa processes:

$ ps aux | grep bwa
user     17692  0.0  0.0 202924 10992 pts/10   S+   11:07   0:00 /home/user/.virtualenvs/env/bin/python /home/user/.virtualenvs/env/bin/bwameth.py --reference /mnt/stripe/bwa-indexes/hg18.fa --threads 15 --prefix srx_tmp_hg18 ./SRR019600.fastq
user     17694  900  7.4 15387576 14726120 pts/10 Sl+ 11:07  19:03 bwa mem -T 40 -B 2 -L 10 -CM -R @RG?ID:bm?SM:bm -t 15 /mnt/stripe/bwa-indexes/hg18.fa.bwameth.c2t </home/user/.virtualenvs/env/bin/python /home/user/.virtualenvs/env/src/bwameth-head/bwameth.py c2t ./SRR019600.fastq NA
user     17697  909  7.4 15387576 14767156 pts/10 Sl+ 11:07  19:15 bwa mem -T 40 -B 2 -L 10 -CM -R @RG?ID:bm?SM:bm -t 15 /mnt/stripe/bwa-indexes/hg18.fa.bwameth.c2t </home/user/.virtualenvs/env/bin/python /home/user/.virtualenvs/env/src/bwameth-head/bwameth.py c2t ./SRR019600.fastq NA
user     17698  9.6  0.0 200508 10692 pts/10   S+   11:07   0:11 /home/user/.virtualenvs/env/bin/python /home/user/.virtualenvs/env/src/bwameth-head/bwameth.py c2t ./SRR019600.fastq NA
user     17699  9.9  0.0 200504 10692 pts/10   S+   11:07   0:12 /home/user/.virtualenvs/env/bin/python /home/user/.virtualenvs/env/src/bwameth-head/bwameth.py c2t ./SRR019600.fastq NA

and after some time one of the bwas become a zombie:

user     17697  288  0.0      0     0 pts/10   Z+   11:07  69:29 [bwa] <defunct>
superbobry commented 10 years ago

Hello, can you reproduce the zombie issue?

superbobry commented 10 years ago

I think the problem is in this line, which silently eats the exception from the csv.reader and performs a second nopen on the fname.

Update: and here's the exception:

TypeError('quotechar must be set if quoting enabled',)

So, you should use dialect.quoting = csv.QUOTE_NONE instead of dialect.quotechar = None.

brentp commented 10 years ago

ah, you might be right. i am able to reproduce. but the .bam and .bai are still created. i will look into it.

On Mon, Jun 16, 2014 at 3:05 AM, Sergei Lebedev notifications@github.com wrote:

I think the problem is in this https://github.com/brentp/toolshed/blob/master/toolshed/files.py#L218 line, which silently eats the exception from the csv.reader and performs a second nopen on the fname.

— Reply to this email directly or view it on GitHub https://github.com/brentp/bwa-meth/issues/6#issuecomment-46155858.

mcieslik-mctp commented 9 years ago

I think I am still running into this problem (I updated both bwa-meth and toolshed to the latest github versions).

For some FASTQ pairs the process hangs without an error after some time. I noticed that it typically occurs after a new BAM temp file (i.e 0001) is created.

I have not been able to identify the what the problem with the FASTQ files is (although commas are present in quality scores of files that succeed and not in files that fail. The error is fully reproducible.

for aln in handle_reads(pair_list, set_as_failed):
    out.write(str(aln) + '\n')
    <<seems to hang here>>>
brentp commented 9 years ago

@mcieslik-mctp can you post files to reproduce?