brentp / bwa-meth

fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome
https://arxiv.org/abs/1401.1129
MIT License
139 stars 53 forks source link

IOError: Paired End Reads #32

Closed NicolaLady closed 1 year ago

NicolaLady commented 7 years ago

Hi Brent,

I am using bwa-meth to align paired end reads with the alignment failing due to an IO error,

I am using toolshed 0.4.6 bwa-meth-0.10 bwa 0.7.15-r1140 samtools 1.3.1

See the cmd line output below,

Kind Regards

Nicola

running: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG ID:BP_1207.BPR.150912.HiSeq2500.FCB.lane7.R SM:BP_1207.BPR.150912.HiSeq2500.FCB.lane7.R' -t 8  hg19_bwameth/human_g1k_v37.fasta.bwameth.c2t '</usr/bin/python /Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t BP_1207.BPR.150912.HiSeq2500.FCB.lane7.R1.fastq BP_1207.BPR.150912.HiSeq2500.FCB.lane7.R2.fastq'
writing to:
samtools view -bS - | samtools sort - BP_1207.lane7
[M::bwa_idx_load_from_disk] read 0 ALT contigs
converting reads in BP_1207.BPR.150912.HiSeq2500.FCB.lane7.R1.fastq,BP_1207.BPR.150912.HiSeq2500.FCB.lane7.R2.fastq
[M::process] read 634922 sequences (80000172 bp)...
[M::process] 0 single-end sequences; 634922 paired-end sequences
[M::process] read 634922 sequences (80000172 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 101430, 14, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (105, 126, 155)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (5, 255)
[M::mem_pestat] mean and std.dev: (131.80, 37.94)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 305)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (198, 445, 1766)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 4902)
[M::mem_pestat] mean and std.dev: (824.00, 985.05)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 6470)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation RF
[M::mem_process_seqs] Processed 634922 reads in 953.682 CPU sec, 119.192 real sec
Traceback (most recent call last):
  File "/usr/local/bin/bwameth.py", line 5, in <module>
    pkg_resources.run_script('bwameth==0.10', 'bwameth.py')
  File "build/bdist.macosx-10.9-intel/egg/pkg_resources.py", line 487, in run_script
    ns.clear()
  File "build/bdist.macosx-10.9-intel/egg/pkg_resources.py", line 1337, in run_script
    raise AssertionError(
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 601, in <module>
    main(sys.argv[1:])
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 586, in main
    set_as_failed=args.set_as_failed)
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 259, in bwa_mem
    as_bam(cmd, fa, prefix, calmd, set_as_failed)
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 299, in as_bam
    out.write(str(aln) + '\n')
IOError: [Errno 32] Broken pipe
[fputs] Broken pipe
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 601, in <module>
    main(sys.argv[1:])
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 551, in main
    sys.exit(convert_reads(args[1], args[2]))
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 102, in convert_reads
    out.write("".join((name, seq, "\n+\n", qual)))
IOError: [Errno 32] Broken pipe
brentp commented 7 years ago

can you share a small subset of the fastqs that demonstrate the problem?

NicolaLady commented 7 years ago

Yep, see below- Thanks

@HISEQ:584:C7H7KANXX:7:1101:1243:2044 1:N:0:GCCAAT
TTATTTTGGTTTCGTTTGAGGAGTTTTTTAGTTCGTTGTTGTATTATGGGATTTTTTTTTTGGGTTGGTCGAGGACGGAGTCGGTTTTTTGGGTTTGCGGGAAGGTGTGGAGGGAGAGGTATAGGT
+
BBA@BFGGGGGGGGGGGGGGGGGFFGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGBGFCGGB>CGGGGGG<GGGGGGGDEGG.>DGGGC/C>D.6CB/D/CG/C8A68@GD.9C.@B#
@HISEQ:584:C7H7KANXX:7:1101:1149:2054 1:N:0:GCCAAT
TTTTTTTTTGAGTTTAATTTTAAGTTAATTAGTTTAAGGTTTGTATAATTAATTTTTTTTAGTTTGGAGGATGTATTTGAGGGGAGTGTTTTATAGTATGGAGATATAATTATTTATTTGTGAAGA
+
CCCCCGGGGBFGGGGEGGGGGGGGGGGGGGEEFGGGBFGGGGB11=1:FG>CGGGGGGDG01=:F:/0=/>0@:@DGFGEBBD.9.C0;CFG0F00;09;==0@CDD/D@D=DGB@@B..@B7@##
@HISEQ:584:C7H7KANXX:7:1101:1336:2019 1:N:0:GCCAAT
GNTGGGATGAATTGTAATTGGAGTGTTTTTGTTTTGGTTGGTATTTTTGGGGTTGATTAGTTGTTGTTTGTGAGTTTTGGATTTAGTTGGTGTTTGTATGTGTTTTGGGTAGATCGGAAGAGCACA
@HISEQ:584:C7H7KANXX:7:1101:1243:2044 2:N:0:GCCAAT
ACAACACCGATTCCCACCTATACCTCTCCCTCCACACCTTCCCGCAAACCCAAAAAACCGACTCCGTCCTCGACCAACCCAAAAAAAAAATCCCATAATACAACAACGAACTANAAAACTCCTCAA
+
BBBCCGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGBBFCGGGGGGGGGGGGBF/ECA/CF:CE<FGGGEGGFGGGGGGBDGDDD@DGGGEEG=DD=DBEDGG6CDGG#896...96@BBE
@HISEQ:584:C7H7KANXX:7:1101:1149:2054 2:N:0:GCCAAT
AATAACCATTCATCTTCAATCTATACACCTTTCAAATACATCCTAAACCCCTAAATAAAAATAACTTCTTTTTTCCTTTCTCCTCCCCACTTCTCTCTTCACAAATAAATAATTATATCTCCATAC
+
BBBBBGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGFGGECEGGEGGGGGGGGGGGGGGGGGFGGFGE@EG0F;:CGFGGGGGGGGGGGGGGGGGG@FGGGGGFCFGGGEGGGG0FGGGGG>
@HISEQ:584:C7H7KANXX:7:1101:1336:2019 2:N:0:GCCAAT
ACCCAAAACACATACAAACACCAACTAAATCCAAAACTCACAAACAACAACTAATCAACCCCAAAAATACCAACCAAAACAAAAACACTCCAATTACAATTCATCCNNACNNNNNNNNNGAGCGTC
NicolaLady commented 7 years ago

Hi brent,

Could the issue be samtools? Can you recommend a fix for it?

Kind Regards

Nicola

Macusers-iMac:BisulfiteAlignment macuser$ cat nohup.out 
running: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG ID:BP_1207.BPR.151019.HiSeq2500.FCB.lane6.R_val_    SM:BP_1207.BPR.151019.HiSeq2500.FCB.lane6.R_val_' -t 8  hg19_bwameth/human_g1k_v37.fasta.bwameth.c2t '</usr/bin/python /Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t BP_1207_RG2/BP_1207.BPR.151019.HiSeq2500.FCB.lane6.R1_val_1.fq.gz BP_1207_RG2/BP_1207.BPR.151019.HiSeq2500.FCB.lane6.R2_val_2.fq.gz'
writing to:
samtools view -bS - | samtools sort - BP_1207
Macusers-iMac:BisulfiteAlignment macuser$ cat nohup.out 
running: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG ID:BP_1207.BPR.151019.HiSeq2500.FCB.lane6.R_val_    SM:BP_1207.BPR.151019.HiSeq2500.FCB.lane6.R_val_' -t 8  hg19_bwameth/human_g1k_v37.fasta.bwameth.c2t '</usr/bin/python /Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t BP_1207_RG2/BP_1207.BPR.151019.HiSeq2500.FCB.lane6.R1_val_1.fq.gz BP_1207_RG2/BP_1207.BPR.151019.HiSeq2500.FCB.lane6.R2_val_2.fq.gz'
writing to:
samtools view -bS - | samtools sort - BP_1207
[M::bwa_idx_load_from_disk] read 0 ALT contigs
converting reads in BP_1207_RG2/BP_1207.BPR.151019.HiSeq2500.FCB.lane6.R1_val_1.fq.gz,BP_1207_RG2/BP_1207.BPR.151019.HiSeq2500.FCB.lane6.R2_val_2.fq.gz
[M::process] read 681348 sequences (80000018 bp)...
[M::process] 0 single-end sequences; 681348 paired-end sequences
[M::process] read 682254 sequences (80000034 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 97908, 9, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (107, 130, 160)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 266)
[M::mem_pestat] mean and std.dev: (135.59, 40.01)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 319)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 681348 reads in 804.827 CPU sec, 100.547 real sec
Traceback (most recent call last):
  File "/usr/local/bin/bwameth.py", line 5, in <module>
    pkg_resources.run_script('bwameth==0.10', 'bwameth.py')
  File "build/bdist.macosx-10.9-intel/egg/pkg_resources.py", line 487, in run_script
    ns.clear()
  File "build/bdist.macosx-10.9-intel/egg/pkg_resources.py", line 1337, in run_script
    raise AssertionError(
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 601, in <module>
    main(sys.argv[1:])
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 586, in main
    set_as_failed=args.set_as_failed)
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 259, in bwa_mem
    as_bam(cmd, fa, prefix, calmd, set_as_failed)
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 299, in as_bam
    out.write(str(aln) + '\n')
IOError: [Errno 32] Broken pipe
[M::process] 0 single-end sequences; 682254 paired-end sequences
[fputs] Broken pipe
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 601, in <module>
    main(sys.argv[1:])
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 551, in main
    sys.exit(convert_reads(args[1], args[2]))
  File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 102, in convert_reads
    out.write("".join((name, seq, "\n+\n", qual)))
IOError: [Errno 32] Broken pipe
[1]+  Exit 1                  nohup /usr/local/bin/bwameth.py --threads 8 --prefix BP_1207 --reference hg19_bwameth/human_g1k_v37.fasta BP_1207_RG2/BP_1207.BPR.151019.HiSeq2500.FCB.lane6.R1_val_1.fq.gz BP_1207_RG2/BP_1207.BPR.151019.HiSeq2500.FCB.lane6.R2_val_2.fq.gz
brentp commented 7 years ago

Hi, I think this might be an issue with bwa only being able to handle read-names of a certain length. I append the original sequence to the read name and it seems there's a limit to how long that can be. I opened an issue on bwa, but haven't had a response.

NicolaLady commented 7 years ago

Ok thank you!

aloraine205 commented 7 years ago

I am having the same problem. However, I was previously able to run bwameth.py on the same file w/o getting this error.

So far as I can tell, the only change was an upgrade to samtools version 1.3.1 from 0.1.18.

aloraine205 commented 7 years ago

Switching back to samtools 0.1.18 fixed the problem.

Email me if you have any questions - you can find me at lorainelab.org.

Thank you very much Brett for this great tool. I'm liking it a lot.

brentp commented 7 years ago

Great! Thanks for figuring that out. The version of bwa-meth in master doesn't use samtools any longer. It just outputs directly to STDOUT. I'll leave this issue open as a place-holder to check on if the new samtools works with the new version and if not, why.