JoseBlanca / seq_crumbs

Little sequence file utilities meant to work within Unix pipelines
Other
37 stars 10 forks source link

Broken pipeTraceback in sff_extract #2

Closed binarybana closed 11 years ago

binarybana commented 11 years ago

When trying a:

$ sff_extract -c zz03_A_RL1.sff | head -20

I get a good 20 lines of output and then:

An unexpected error happened.
The seq_crumbs developers would appreciate your feedback.
Please send them the error log: sff_extract.error

[Errno 32] Broken pipeTraceback (most recent call last):
  File "/usr/local/bin/sff_extract", line 117, in <module>
    sys.exit(main(extract_sff))
  File "/usr/local/lib/python2.7/dist-packages/crumbs/utils/bin_utils.py", line 60, in main
    return(funct())
  File "/usr/local/bin/sff_extract", line 89, in extract_sff
    write_seqrecords(seqs, args['out_fhand'])
  File "/usr/local/lib/python2.7/dist-packages/crumbs/seqio.py", line 54, in write_seqrecords
    SeqIO.write(seqs, fhand, file_format)
  File "/usr/local/lib/python2.7/dist-packages/Bio/SeqIO/__init__.py", line 426, in write
    count = writer_class(fp).write_file(sequences)
  File "/usr/local/lib/python2.7/dist-packages/Bio/SeqIO/Interfaces.py", line 254, in write_file
    count = self.write_records(records)
  File "/usr/local/lib/python2.7/dist-packages/Bio/SeqIO/Interfaces.py", line 239, in write_records
    self.write_record(record)
  File "/usr/local/lib/python2.7/dist-packages/Bio/SeqIO/QualityIO.py", line 1462, in write_record
    self.handle.write("@%s\n%s\n+\n%s\n" % (title, seq_str, qualities_str))
IOError: [Errno 32] Broken pipe

I've put sff_extract.error in a pastebin here.

JoseBlanca commented 11 years ago

Hi Jason: Can you reproduce the problem when you're not piping the sff_extract result?

binarybana commented 11 years ago

The piping is critical to the crash, let me explain: Oddly enough, I actually encountered the root cause of this problem in a completely different context later the same day. To cut to the chase: head -20 closes the pipe when it finishes processing the first 20 lines, sff_extract then tries to write to this broken pipe and spits out the error above.

sed on the other hand will dutifully process the entire output, even when it has no need of it. Thus keeping the pipeline alive for sff_extract to continue dumping into. So the following is nearly equivalent but works:

sff_extract -c zz03_A_RL2.sff | sed -n '1,+19p'

The problem with this is you still have to process the entire input! So I think sff_extract should check the pipe it is writing out to between each chunked write to make sure it is still open.

JoseBlanca commented 11 years ago

Thanks for the information, I wasn't aware of that. You're explanation has been very useful. We've reproduced the bug and, I think, that we're fixed with commit: https://github.com/JoseBlanca/seq_crumbs/commit/17111c697335e2d849c992b2412b410303c92646

binarybana commented 11 years ago

Thanks Jose! I'll let you know when I get around to testing it, but it certainly looks like this fixes the problem.