brentp / bwa-meth

fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome
https://arxiv.org/abs/1401.1129
MIT License
139 stars 53 forks source link

Samtools 1.3 support #21

Closed iromeo closed 8 years ago

iromeo commented 8 years ago

In version 1.3 samtools deleted some deprecated CLI options for "samtools sort" command (see http://www.htslib.org/doc/samtools-1.3.html) :

Historically samtools sort also accepted a less flexible way of specifying the final and temporary output filenames:

samtools sort [-f] [-o] in.bam out.prefix

This has now been removed. The previous out.prefix argument (and -f option, if any) should be changed to an appropriate combination of -T PREFIX and -o FILE. The previous -o option should be removed, as output defaults to standard output

So bwa-meth.py doesn't work with latest samtools. It can fail with different kinds of errors:

  1. AssertionError
  2. Exception: bad or empty fastqs
  3. IOError: [Errno 32] Broken pipe (most likely same problem as in issue #18)

Details:

  1. AssertionError $ bwameth.py --reference /indexes/hg38.fa --threads 20 --prefix issue18 ./foo.fastq :

    unning: bwa mem -T 40 -B 2 -L 10 -CM -R '@RG    ID:bm   SM:bm' -t 20  /indexes/hg38.fa.bwameth.c2t '</home/user/anaconda/bin/python /home/user/anaconda/bin/bwameth.py c2t ./foo.fastq NA'
    writing to:
    samtools view -bS - | samtools sort -m 2415919104 - issue18
    converting reads in ./foo.fastq,NA
    WARNING: running bwameth in single-end mode
    [M::main_mem] read 10 sequences (900 bp)...
    [M::mem_process_seqs] Processed 10 reads in 0.006 CPU sec, 0.002 real sec
    [main] Version: 0.7.9a-r786
    [main] CMD: bwa mem -T 40 -B 2 -L 10 -CM -R @RG ID:bm   SM:bm -t 20 /indexes/hg38.fa.bwameth.c2t </home/user/anaconda/bin/python /home/user/anaconda/bin/bwameth.py c2t ./foo.fastq NA
    [main] Real time: 13.531 sec; CPU: 12.509 sec
    Traceback (most recent call last):
    File "/home/user/anaconda/bin/bwameth.py", line 548, in <module>
     main(sys.argv[1:])
    File "/home/user/anaconda/bin/bwameth.py", line 533, in main
     set_as_failed=args.set_as_failed)
    File "/home/user/anaconda/bin/bwameth.py", line 198, in bwa_mem
     as_bam(cmd, fa, prefix, calmd, set_as_failed)
    File "/home/user/anaconda/bin/bwameth.py", line 243, in as_bam
     assert p.wait() == 0
    AssertionError

    Where foo.fastq is fastq content suggested in #18

  2. Exception: bad or empty fastqs bwameth.py --reference /indexes/hg38.fa --threads 20 --prefix SRR033942 ./GSM491349/sra/SRX015763/SRR033942/SRR033942.sra

    running: bwa mem -T 40 -B 2 -L 10 -CM -R '@RG   ID:bm   SM:bm' -t 20  /indexes/hg38.fa.bwameth.c2t '</home/user/anaconda/bin/python /home/user/anaconda/bin/bwameth.py c2t ./GSM491349/sra/SRX015763/SRR033942/SRR033942.sra NA'
    writing to:
    samtools view -bS - | samtools sort -m 2415919104 - SRR033942
    converting reads in ./GSM491349/sra/SRX015763/SRR033942/SRR033942.sra,NA
    WARNING: running bwameth in single-end mode
    [gzclose] buffer error
    Traceback (most recent call last):
    File "/home/user/anaconda/bin/bwameth.py", line 548, in <module>
     main(sys.argv[1:])
    File "/home/user/anaconda/bin/bwameth.py", line 533, in main
     set_as_failed=args.set_as_failed)
    File "/home/user/anaconda/bin/bwameth.py", line 198, in bwa_mem
     as_bam(cmd, fa, prefix, calmd, set_as_failed)
    File "/home/user/anaconda/bin/bwameth.py", line 232, in as_bam
     raise Exception("bad or empty fastqs")
    Exception: bad or empty fastqs
  3. IOError: [Errno 32] Broken pipe bwameth.py --reference /indexes/hg38.fa --threads 20 --prefix //GSE19418_tmp/GSM491349/SRR033987_hg38 //GSE19418_tmp/GSM491349/tmp/SRR033987.fastq.gz

    running: bwa mem -T 40 -B 2 -L 10 -CM -R '@RG   ID:bm   SM:bm' -t 20  //indexes/hg38.fa.bwameth.c2t '</home/user/anaconda/bin/python /home/user/anaconda/bin/bwameth.py c2t /GSE19418_tmp/GSM491349/tmp/SRR033987.fastq.gz NA'
    writing to:
    samtools view -bS - | samtools sort -m 2415919104 - /GSE19418_tmp/GSM491349/SRR033987_hg38
    converting reads in /GSE19418_tmp/GSM491349/tmp/SRR033987.fastq.gz,NA
    WARNING: running bwameth in single-end mode
    [M::main_mem] read 2000000 sequences (200000000 bp)...
    [M::mem_process_seqs] Processed 2000000 reads in 7131.578 CPU sec, 896.316 real sec
    Traceback (most recent call last):
    File "/home/user/anaconda/bin/bwameth.py", line 548, in <module>
     main(sys.argv[1:])
    File "/home/user/anaconda/bin/bwameth.py", line 533, in main
     set_as_failed=args.set_as_failed)
    File "/home/user/anaconda/bin/bwameth.py", line 198, in bwa_mem
     as_bam(cmd, fa, prefix, calmd, set_as_failed)
    File "/home/user/anaconda/bin/bwameth.py", line 238, in as_bam
     out.write(str(aln) + '\n')
    IOError: [Errno 32] Broken pipe
    [fputs] Broken pipe

Solution, just use new API, available at least since 1.0 (http://www.htslib.org/doc/samtools-1.0.html)

My fixes: https://github.com/iromeo/bwa-meth/commit/c45210d87fd985bb7443ba2f34615645ea46c7b8 and https://github.com/iromeo/bwa-meth/commit/0a1703b9e9ccaec36b4c4b9b9346d3126218744c

brentp commented 8 years ago

Make sense to me. can you make a pull request?

iromeo commented 8 years ago

Thx!