brentp / bwa-meth

fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome
https://arxiv.org/abs/1401.1129
MIT License
139 stars 53 forks source link

Test fail: both release 0.9 and git cloned #3

Closed vogelwk closed 9 years ago

vogelwk commented 10 years ago

index worked, align failed see below. python 2.7.6, Centos 6.2, setuptools 3.3,

note: lib/python2.7/site-packages/ contains only a single file: bwameth-0.09-py2.7.egg

It does not contain a directory so named, etc: bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py

ERROR Report:

bwameth.py --reference ref.fa t_R1.fastq.gz t_R2.fastq.gz -t 12 running: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG ID:t_R SM:t_R' -t 12 ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz' writing to: samtools view -bS - | samtools sort - bwa-meth /home/pi/vogelw/bin/python: can't find 'main' module in '/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py' [gzclose] buffer error cmd was:bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG ID:t_R SM:t_R' -t 12 ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz' Traceback (most recent call last): File "/home/pi/vogelw/bin/bwameth.py", line 5, in pkg_resources.run_script('bwameth==0.09', 'bwameth.py') File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 528, in run_script File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1401, in run_script File "/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 582, in

File "/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 579, in main

File "/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 258, in bwa_mem

File "/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 286, in as_bam

File "/home/pi/vogelw/lib/python2.7/site-packages/toolshed-0.3.6-py2.7.egg/toolshed/files.py", line 252, in reader for toks in line_gen: File "/home/pi/vogelw/lib/python2.7/site-packages/toolshed-0.3.6-py2.7.egg/toolshed/files.py", line 53, in process_iter raise ProcessException(cmd) toolshed.files.ProcessException: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG ID:t_R SM:t_R' -t 12 ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz'

brentp commented 10 years ago

Can you tell me the output of this:

bwa 2>&1 | grep Version
brentp commented 10 years ago

scratch the previous. can you update to the latest revision and retry? thanks for reporting.

vogelwk commented 10 years ago

Here you go (now with v3.4 of setup tools) also tried tried v 2.01 with a clean install of python 2.7.6 earlier today:

bwameth.py --reference ref.fa t_R1.fastq.gz t_R2.fastq.gz -t 12 running: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG ID:t_R SM:t_R' -t 12 ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz' writing to: samtools view -bS - | samtools sort - bwa-meth /home/pi/vogelw/bin/python: can't find 'main' module in '/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py' [gzclose] buffer error cmd was:bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG ID:t_R SM:t_R' -t 12 ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz' Traceback (most recent call last): File "/home/pi/vogelw/bin/bwameth.py", line 5, in pkg_resources.run_script('bwameth==0.09', 'bwameth.py') File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 528, in run_script File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1401, in run_script File "/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 582, in

File "/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 579, in main

File "/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 258, in bwa_mem

File "/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 286, in as_bam

File "/home/pi/vogelw/lib/python2.7/site-packages/toolshed-0.3.6-py2.7.egg/toolshed/files.py", line 252, in reader for toks in line_gen: File "/home/pi/vogelw/lib/python2.7/site-packages/toolshed-0.3.6-py2.7.egg/toolshed/files.py", line 53, in process_iter raise ProcessException(cmd) toolshed.files.ProcessException: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG ID:t_R SM:t_R' -t 12 ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz'

brentp commented 10 years ago

can you try this once more. i hadn't actually pushed.

vogelwk commented 10 years ago

bwameth.py --reference ref.fa t_R1.fastq.gz t_R2.fastq.gz -t 12 running: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG ID:t_R SM:t_R' -t 12 ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz' writing to: samtools view -bS - | samtools sort - bwa-meth converting reads in t_R1.fastq.gz,t_R2.fastq.gz [M::main_mem] read 92726 sequences (9365326 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 44942, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (109, 137, 175) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 307) [M::mem_pestat] mean and std.dev: (145.03, 48.64) [M::mem_pestat] low and high boundaries for proper pairs: (1, 373) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 92726 reads in 10.016 CPU sec, 0.863 real sec [main] Version: 0.7.8-r455 [main] CMD: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R @RG ID:t_R SM:t_R -t 12 ref.fa.bwameth.c2t </home/pi/vogelw/bin/python /raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz [main] Real time: 8.957 sec; CPU: 10.384 sec Traceback (most recent call last): File "/home/pi/vogelw/bin/bwameth.py", line 5, in pkg_resources.run_script('bwameth==0.10', 'bwameth.py') File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 528, in run_script File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1394, in run_script File "/raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 582, in main(sys.argv[1:]) File "/raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 579, in main set_as_failed=args.set_as_failed) File "/raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 258, in bwa_mem as_bam(cmd, fa, prefix, calmd, set_as_failed) File "/raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 292, in as_bam pair_list = [Bam(toks) for toks in pair_list] File "/raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 163, in init self.flag = int(self.flag) ValueError: invalid literal for int() with base 10: '0+0k 10760+0io 5pf+0w' 18.259u 0.530s 0:09.73 193.0% 0+0k 10864+0io 5pf+0w

vogelwk commented 10 years ago

samtools flagstat bwa-meth.bam 92221 + 579 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 92181 + 576 mapped (99.96%:99.48%) 92221 + 579 paired in sequencing 46118 + 289 read1 46103 + 290 read2 91789 + 0 properly paired (99.53%:0.00%) 92142 + 570 with itself and mate mapped 39 + 6 singletons (0.04%:1.04%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

brentp commented 10 years ago

Hmm, I'm not sure how it's getting this output:

self.flag = int(self.flag)
ValueError: invalid literal for int() with base 10: '0+0k 10760+0io 5pf+0w'
18.259u 0.530s 0:09.73 193.0% 0+0k 10864+0io 5pf+0w

It must be the last line from what bwa mem is putting out somehow because as your flagstat shows, you have all of the data. I don't know where that would be coming from.

I will look into this further tomorrow, but I am not able to recreate it. I even tried that exact revision of bwa.

brentp commented 10 years ago

what does

type bwa

show?

vogelwk commented 10 years ago

$ type bwa bwa is hashed (/data1/opt/bin/bwa)

brentp commented 10 years ago

Can you tell me the output of this:

bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p  ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz' | tail
vogelwk commented 10 years ago

bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz' | tail converting reads in t_R1.fastq.gz,t_R2.fastq.gz [M::main_mem] read 92726 sequences (9365326 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 44942, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat](25, 50, 75) percentile: (109, 137, 175) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 307) [M::mem_pestat] mean and std.dev: (145.03, 48.64) [M::mem_pestat] low and high boundaries for proper pairs: (1, 373) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 92726 reads in 8.495 CPU sec, 8.502 real sec [main] Version: 0.7.8-r455 HISEQ:105:C2UE1ACXX:3:2316:20506:100449 99 fchrREF 61007 60 101M = 61122 216 AATTTTTTTTTTAAGGGTGAGGTTTTTAGTTAGTTGTTGATAGTTAAGAATGGTAATTAATTTTTTGAAAGATATTGTTTGATTTTTGTTTTTATGTGGTT BBBFFFFFFFFFFF<<7BFFF7FBFFIF7BFF7BF0BB7<BF0<FFF<BFFB7BFFFFFFFFFFFF0<BBBBBBBB0<BB'<BBFFF'<BFFFBF0B007< NM:i:0 MD:Z:101 AS:i:101 XS:i:0 YS:Z:AATTTTTTTTTTAAGGGTGAGGTTTTTAGTTAGTTGTTGATAGTTAAGAATGGTAATTAATTTTTTGAAAGATATTGTTTGATTTTTGTTTTTACGTGGTT YC:Z:CT HISEQ:105:C2UE1ACXX:3:2316:20506:100449 147 fchrREF 61122 60 101M = 61007 -216 TTGAGTTTTTATATTTATGATGTATTATTGTAGTTATTATGTTGGTAAAGGGTTTTTTGAGTTTTTTGTTATGTTGATTTGAGTTTTAGAGTATATATTTT BBBBFFFFFFFFBFFFFFFFFFFFFFFFBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIFFFFFFFFFFBBB NM:i:0 MD:Z:101 AS:i:101 XS:i:0 YS:Z:AAAATATATACTCTAAAACTCAAATCAACATAACAAAAAACTCAAAAAACCCTTTACCAACATAATAACTACAATAATACATCATAAATATAAAAACTCAA YC:Z:GA HISEQ:105:C2UE1ACXX:3:2316:11659:100715 99 fchrREF 27015 60 101M = 27028 114 TATGGTATTGTGTTTTGTTTTTTTGGTATTTGTGAGGGTAGAATTGTTTTTGGGTTTTAATTTTTTTAAGTATGGGAGGTTGTGGGTATAAGGTGGATTTT <BB<<BFFFBFFFFFIBFFIIIIIB<BFFFI<B<FFFFIFFFFFF<FFFII777BBFFFFFFFFFFFBB<BBB7777<<B70<<<<BB<BB<0<BBBBBFF NM:i:0 MD:Z:101 AS:i:101 XS:i:0 YS:Z:TATGGTATTGTGTTTTGTTTTTTTGGTATTTGTGAGGGTAGAATTGTTTTTGGGTTTTAATTTTTTTAAGTATGGGAGGTTGTGGGTATAAGGTGGATTTT YC:Z:CT HISEQ:105:C2UE1ACXX:3:2316:11659:100715 147 fchrREF 27028 60 101M = 27015 -114 TTTTTTTTTTTGGTATTTGTGAGGGTAGAATTGTTTTTGGGTTTTAATTTTTTTAAGTATGGGAGGTTGTGGGTATAAGGTGGATTTTTTTTTTTTTTATT B<<0B<BBBBB<<<'BB<0000BB<0''<070'<BBBBBBBBBBBBB<BBBBB<<B<<<<BB<0<B<<7<BB<777BB'<'B<BIIIIFFFFFFFFFFBBB NM:i:1 MD:Z:3G97 AS:i:98 XS:i:19 YS:Z:AATAAAAAAAAAAAAAATCCACCTTATACCCACAACCTCCCATACTTAAAAAAATTAAAACCCAAAAACAATTCTACCCTCACAAATACCAAAAAAAAAAA YC:Z:GA HISEQ:105:C2UE1ACXX:3:2316:18269:100874 83 rchrREF 93187 60 101M = 93117 -171 AACATAAACCAAAACCTTATAAATAAACAACATCTCCTACACTTCTACAAAAAAAAACAAAAACCCCAACCAAACCATTCCAAAACAAAATTCCTAAAAAC #B<FBBB7''B<00''BBB0FBBBBB<0BB07BBBBBBB<<<B7777'FFFFFIIIFBFFFBIFFFBFB0BFFFBFFFB<<IIFF0IIFFFFBBFFFFBB< NM:i:1 MD:Z:0C100 AS:i:100 XS:i:0 YS:Z:GTTTTTAGGAATTTTGTTTTGGAACGGTTTGGTCGGGGTTTTCGTTTTTTTTTGTAGAAGTGTAGGAGATGTTGTTTATTTATAAGGTTTTGGTTTATGCT YC:Z:CT HISEQ:105:C2UE1ACXX:3:2316:18269:100874 163 rchrREF 93117 60 101M = 93187 171 ACCCTACAAAAACAACACAAATACAACCATAACTACAAACAACCTTACACACACCCTACTCACACTTCCACACATAAACCAAAACCTTATAAATAAACAAC BBBFFFFFFFFFFIIIIIIIIIIIIIIIIFIIIIFIIIIIFFIIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:101 AS:i:101 XS:i:0 YS:Z:ACCCTACAAAAACAACACGAATACAACCGTAACTACGAACGACCTTACGCACGCCCTACTCGCACTTCCACGCATAAACCAAAACCTTATAAATAAACAAC YC:Z:GA HISEQ:105:C2UE1ACXX:3:2316:7508:101058 99 fchrREF 29997 60 101M = 30034 138 AGGATTTTTTGAGGTGGTTGGTTATAGGTGAGGGTATTTGGGGTTTTGTTTAGAATTTGGATTGTTTGATTTGTGAGTTTTTGTGGTTTTTTTGTTATTTT BBBFFFFFFFB<BBFFFFF00BFFIF<0BBFFBFFF<BF007<BBFF7BBFB0<BBFF70<BF<BBB7<BBF7<7B<<BBFF7B00<7BBFB'0<<BFFFF NM:i:1 MD:Z:92G8 AS:i:98 XS:i:23 YS:Z:AGGATTTTTTGAGGTGGTTGGTTATAGGCGAGGGTATTTGGGGTTTTGTTTAGAATTTGGATCGTTTGATTTGTGAGTTTTTGTGGTTTTTCTGTTATTTT YC:Z:CT HISEQ:105:C2UE1ACXX:3:2316:7508:101058 147 fchrREF 30034 60 101M = 29997 -138 TTGGGGTTTTGTTTAGAATTTGGATTGTTTGATTTGTGAGTTTTTGTGGTTTTTTGGTTATTTTGATGTGAATATGTTAGGGGTTGAGTAGAAAGTAGTAT FB7000FBBBBBFFBBBBFFFFBBBFFFFFFBFFFFFFFFFFFIFBB7FBFFFFFFFFF<FFFFFIIFFFFFFFFFFF<IIIFFFIFFFFFFFFFFFFBBB NM:i:0 MD:Z:101 AS:i:101 XS:i:20 YS:Z:ATACTACTTTCTACTCAACCCCTAACATATTCACATCAAAATAACCGAAAAACCACAAAAACTCACAAATCAAACGATCCAAATTCTAAACAAAACCCCAA YC:Z:GA HISEQ:105:C2UE1ACXX:3:2316:18777:101069 83 rchrREF 62392 60 30S71M = 62393 -70 ACTAAAATTCAAACATATACTCTTCCAATCTAACTTCCTAATTAATCCCATTCCATAACAACTTCTACTACCTCTATCCCATTTCTCCCATAAAATTAATA <0BBBBBB<0BB70BB<<7'<0B70'FFB7FBB0FB7<FFFFBFBB7'BFB0'0FBFB'FFBFFBFBBFFB<B<FB<00<FBFFBB000FFFFFFFFFBBB NM:i:0 MD:Z:71 AS:i:71 XS:i:0 YS:Z:TATTAATTTTATGGGAGAAATGGGATAGAGGTAGTAGAAGCTGTTATGGAATGGGATTAATTAGGAAGTTAGATCGGAAGAGCACACGTCTGAACTCCAGT YC:Z:CT HISEQ:105:C2UE1ACXX:3:2316:18777:101069 163 rchrREF 62393 60 70M31S = 62392 70 AACTTCCTAATTAATCCCATTCCATAACAACTTCTACTACCTCTATCCCATTTCTCCCATAAAATTAATAAAATCAAAAAAACATCATATAAAAAAAAAAT BBBFFFFF<FFFFIBFFIFIFIFBFFFFFFIFIFFBFFFBBFBBBFFIIIFI[main] CMD: bwa mem -T 40 -B 2 -L 10 -CM -U 100IFFFFFFFFFIIFFBB0<F<FBFBFIFBFBF<00777<B<77BBB#### NM:i:0 MD:Z:70 AS:i:70 XS:i:0 YS:Z:AACTTCCTAATTAATCCCATTCCATAACAGCTTCTACTACCTCTATCCCATTTCTCCCATAAAATTAATAAGATCGGAAGAGCGTCGTGTAGGGAAAGAGT YC:Z:GA -p ref.fa.bwameth.c2t </home/pi/vogelw/bin/python /raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz [main] Real time: 10.564 sec; CPU: 8.744 sec 10.527u 0.212s 0:10.57 101.5% 0+0k 128+0io 1pf+0w

brentp commented 10 years ago

I wonder were that last line is coming from:

10.527u 0.212s 0:10.57 101.5% 0+0k 128+0io 1pf+0w

It seems to be coming from bwa mem and going to stdout.

vogelwk commented 10 years ago

Morning Brent,

That line is being generated by the csh environment 'set time=10' , in this instance it is being set by the system 'std.cshrc' . 'unset' ing it or running in Bash shell generates the same errors--with out this cpu timing line.

brentp commented 10 years ago

Morning. So you still see this:

self.flag = int(self.flag)
ValueError: invalid literal for int() with base 10: '0+0k 10760+0io 5pf+0w'

? Because that is trying to call int() on the timing data. Thanks for your persistence.

vogelwk commented 10 years ago

Yep, both in a Bash shell test and in C shell tests (w/ 'unset time').

brentp commented 10 years ago

ok, can you try this:

bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p  ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz' > t.sam
tail t.sam

and

/home/pi/vogelw/bin/python /raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz > t.fastq
tail t.fastq

and show the full output from both? after this, I don't have much left for ideas. You could also try

rm ref.fa.bwa*

and reindexing...

vogelwk commented 10 years ago

[vogelw@mus1 example]% bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz' > t.sam converting reads in t_R1.fastq.gz,t_R2.fastq.gz [M::main_mem] read 92726 sequences (9365326 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 44942, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat](25, 50, 75) percentile: (109, 137, 175) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 307) [M::mem_pestat] mean and std.dev: (145.03, 48.64) [M::mem_pestat] low and high boundaries for proper pairs: (1, 373) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 92726 reads in 8.478 CPU sec, 8.522 real sec [main] Version: 0.7.8-r455 [main] CMD: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p ref.fa.bwameth.c2t </home/pi/vogelw/bin/python /raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz [main] Real time: 11.256 sec; CPU: 8.737 sec [vogelw@mus1 example]% tail t.sam HISEQ:105:C2UE1ACXX:3:2316:20506:100449 99 fchrREF 61007 60 101M 61122 216 AATTTTTTTTTTAAGGGTGAGGTTTTTAGTTAGTTGTTGATAGTTAAGAATGGTAATTAATTTTTTGAAAGATATTGTTTGATTTTTGTTTTTATGTGGTT BBBFFFFFFFFFFF<<7BFFF7FBFFIF7BFF7BF0BB7<BF0<FFF<BFFB7BFFFFFFFFFFFF0<BBBBBBBB0<BB'<BBFFF'<BFFFBF0B007< NM:i:0 MD:Z:10AS:i:101 XS:i:0 YS:Z:AATTTTTTTTTTAAGGGTGAGGTTTTTAGTTAGTTGTTGATAGTTAAGAATGGTAATTAATTTTTTGAAAGATATTGTTTGATTTTTGTTTTTACGTGGTT YC:Z:CT HISEQ:105:C2UE1ACXX:3:2316:20506:100449 147 fchrREF 61122 60 101M 61007 -216 TTGAGTTTTTATATTTATGATGTATTATTGTAGTTATTATGTTGGTAAAGGGTTTTTTGAGTTTTTTGTTATGTTGATTTGAGTTTTAGAGTATATATTTT BBBBFFFFFFFFBFFFFFFFFFFFFFFFBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIFFFFFFFFFFBBB NM:i:0 MD:Z:10AS:i:101 XS:i:0 YS:Z:AAAATATATACTCTAAAACTCAAATCAACATAACAAAAAACTCAAAAAACCCTTTACCAACATAATAACTACAATAATACATCATAAATATAAAAACTCAA YC:Z:GA HISEQ:105:C2UE1ACXX:3:2316:11659:100715 99 fchrREF 27015 60 101M 27028 114 TATGGTATTGTGTTTTGTTTTTTTGGTATTTGTGAGGGTAGAATTGTTTTTGGGTTTTAATTTTTTTAAGTATGGGAGGTTGTGGGTATAAGGTGGATTTT <BB<<BFFFBFFFFFIBFFIIIIIB<BFFFI<B<FFFFIFFFFFF<FFFII777BBFFFFFFFFFFFBB<BBB7777<<B70<<<<BB<BB<0<BBBBBFF NM:i:0 MD:Z:10AS:i:101 XS:i:0 YS:Z:TATGGTATTGTGTTTTGTTTTTTTGGTATTTGTGAGGGTAGAATTGTTTTTGGGTTTTAATTTTTTTAAGTATGGGAGGTTGTGGGTATAAGGTGGATTTT YC:Z:CT HISEQ:105:C2UE1ACXX:3:2316:11659:100715 147 fchrREF 27028 60 101M 27015 -114 TTTTTTTTTTTGGTATTTGTGAGGGTAGAATTGTTTTTGGGTTTTAATTTTTTTAAGTATGGGAGGTTGTGGGTATAAGGTGGATTTTTTTTTTTTTTATT B<<0B<BBBBB<<<'BB<0000BB<0''<070'<BBBBBBBBBBBBB<BBBBB<<B<<<<BB<0<B<<7<BB<777BB'<'B<BIIIIFFFFFFFFFFBBB NM:i:1 MD:Z:3G97 AS:i:98 XS:i:19 YS:Z:AATAAAAAAAAAAAAAATCCACCTTATACCCACAACCTCCCATACTTAAAAAAATTAAAACCCAAAAACAATTCTACCCTCACAAATACCAAAAAAAAAAA YC:Z:GA HISEQ:105:C2UE1ACXX:3:2316:18269:100874 83 rchrREF 93187 60 101M 93117 -171 AACATAAACCAAAACCTTATAAATAAACAACATCTCCTACACTTCTACAAAAAAAAACAAAAACCCCAACCAAACCATTCCAAAACAAAATTCCTAAAAAC #B<FBBB7''B<00''BBB0FBBBBB<0BB07BBBBBBB<<<B7777'FFFFFIIIFBFFFBIFFFBFB0BFFFBFFFB<<IIFF0IIFFFFBBFFFFBB< NM:i:1 MD:Z:0C100 AS:i:100 XS:i:0 YS:Z:GTTTTTAGGAATTTTGTTTTGGAACGGTTTGGTCGGGGTTTTCGTTTTTTTTTGTAGAAGTGTAGGAGATGTTGTTTATTTATAAGGTTTTGGTTTATGCT YC:Z:CT HISEQ:105:C2UE1ACXX:3:2316:18269:100874 163 rchrREF 93117 60 101M 93187 171 ACCCTACAAAAACAACACAAATACAACCATAACTACAAACAACCTTACACACACCCTACTCACACTTCCACACATAAACCAAAACCTTATAAATAAACAAC BBBFFFFFFFFFFIIIIIIIIIIIIIIIIFIIIIFIIIIIFFIIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:10AS:i:101 XS:i:0 YS:Z:ACCCTACAAAAACAACACGAATACAACCGTAACTACGAACGACCTTACGCACGCCCTACTCGCACTTCCACGCATAAACCAAAACCTTATAAATAAACAAC YC:Z:GA HISEQ:105:C2UE1ACXX:3:2316:7508:101058 99 fchrREF 29997 60 101M 30034 138 AGGATTTTTTGAGGTGGTTGGTTATAGGTGAGGGTATTTGGGGTTTTGTTTAGAATTTGGATTGTTTGATTTGTGAGTTTTTGTGGTTTTTTTGTTATTTT BBBFFFFFFFB<BBFFFFF00BFFIF<0BBFFBFFF<BF007<BBFF7BBFB0<BBFF70<BF<BBB7<BBF7<7B<<BBFF7B00<7BBFB'0<<BFFFF NM:i:1 MD:Z:92G8 AS:i:98 XS:i:23 YS:Z:AGGATTTTTTGAGGTGGTTGGTTATAGGCGAGGGTATTTGGGGTTTTGTTTAGAATTTGGATCGTTTGATTTGTGAGTTTTTGTGGTTTTTCTGTTATTTT YC:Z:CT HISEQ:105:C2UE1ACXX:3:2316:7508:101058 147 fchrREF 30034 60 101M 29997 -138 TTGGGGTTTTGTTTAGAATTTGGATTGTTTGATTTGTGAGTTTTTGTGGTTTTTTGGTTATTTTGATGTGAATATGTTAGGGGTTGAGTAGAAAGTAGTAT FB7000FBBBBBFFBBBBFFFFBBBFFFFFFBFFFFFFFFFFFIFBB7FBFFFFFFFFF<FFFFFIIFFFFFFFFFFF<IIIFFFIFFFFFFFFFFFFBBB NM:i:0 MD:Z:10AS:i:101 XS:i:20 YS:Z:ATACTACTTTCTACTCAACCCCTAACATATTCACATCAAAATAACCGAAAAACCACAAAAACTCACAAATCAAACGATCCAAATTCTAAACAAAACCCCAA YC:Z:GA HISEQ:105:C2UE1ACXX:3:2316:18777:101069 83 rchrREF 62392 60 30S71M 62393 -70 ACTAAAATTCAAACATATACTCTTCCAATCTAACTTCCTAATTAATCCCATTCCATAACAACTTCTACTACCTCTATCCCATTTCTCCCATAAAATTAATA <0BBBBBB<0BB70BB<<7'<0B70'FFB7FBB0FB7<FFFFBFBB7'BFB0'0FBFB'FFBFFBFBBFFB<B<FB<00<FBFFBB000FFFFFFFFFBBB NM:i:0 MD:Z:71AS:i:71 XS:i:0 YS:Z:TATTAATTTTATGGGAGAAATGGGATAGAGGTAGTAGAAGCTGTTATGGAATGGGATTAATTAGGAAGTTAGATCGGAAGAGCACACGTCTGAACTCCAGT YC:Z:CT HISEQ:105:C2UE1ACXX:3:2316:18777:101069 163 rchrREF 62393 60 70M31S 62392 70 AACTTCCTAATTAATCCCATTCCATAACAACTTCTACTACCTCTATCCCATTTCTCCCATAAAATTAATAAAATCAAAAAAACATCATATAAAAAAAAAAT BBBFFFFF<FFFFIBFFIFIFIFBFFFFFFIFIFFBFFFBBFBBBFFIIIFIIFFFFFFFFFIIFFBB0<F<FBFBFIFBFBF<00777<B<77BBB#### NM:i:0 MD:Z:70AS:i:70 XS:i:0 YS:Z:AACTTCCTAATTAATCCCATTCCATAACAGCTTCTACTACCTCTATCCCATTTCTCCCATAAAATTAATAAGATCGGAAGAGCGTCGTGTAGGGAAAGAGT YC:Z:GA [vogelw@mus1 example]% /home/pi/vogelw/bin/python /raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz > t.fastq converting reads in t_R1.fastq.gz,t_R2.fastq.gz [vogelw@mus1 example]% tail t.fastq + BBBFFFFFFFFFFFFIFFFIII<FFFFFFFFFFFIIFFFFF<FFFFFFFFFBF7BBFIFFFFFFFFFFFBFFFFFFBBBFFFFBBBBFFBBBBBF0007BF @HISEQ:105:C2UE1ACXX:3:2316:18777:101069 YS:Z:TATTAATTTTATGGGAGAAATGGGATAGAGGTAGTAGAAGCTGTTATGGAATGGGATTAATTAGGAAGTTAGATCGGAAGAGCACACGTCTGAACTCCAGT YC:Z:CT TATTAATTTTATGGGAGAAATGGGATAGAGGTAGTAGAAGTTGTTATGGAATGGGATTAATTAGGAAGTTAGATTGGAAGAGTATATGTTTGAATTTTAGT + BBBFFFFFFFFF000BBFFBF<00<BF<B<BFFBBFBFFBFF'BFBF0'0BFB'7BBFBFFFF<7BF0BBF7BFF'07B0<'7<<BB07BB0<BBBBBB0< @HISEQ:105:C2UE1ACXX:3:2316:18777:101069 YS:Z:AACTTCCTAATTAATCCCATTCCATAACAGCTTCTACTACCTCTATCCCATTTCTCCCATAAAATTAATAAGATCGGAAGAGCGTCGTGTAGGGAAAGAGT YC:Z:GA AACTTCCTAATTAATCCCATTCCATAACAACTTCTACTACCTCTATCCCATTTCTCCCATAAAATTAATAAAATCAAAAAAACATCATATAAAAAAAAAAT + BBBFFFFF<FFFFIBFFIFIFIFBFFFFFFIFIFFBFFFBBFBBBFFIIIFIIFFFFFFFFFIIFFBB0<F<FBFBFIFBFBF<00777<B<77BBB####

vogelwk commented 10 years ago

[vogelw@mus1 example]% bwameth.py index ref.fa converting c2t in ref.fa to ref.fa.bwameth.c2t indexing: ref.fa.bwameth.c2t [bwa_index] Pack FASTA... 0.01 sec [bwa_index] Construct BWT for the packed sequence... [BWTIncCreate] textLength=3600000, availableWord=4573648 [bwa_index] 0.71 seconds elapse. [bwa_index] Update BWT... 0.01 sec [bwa_index] Pack forward-only FASTA... 0.01 sec [bwa_index] Construct SA from BWT and Occ... 0.19 sec [main] Version: 0.7.8-r455 [main] CMD: bwa index -a bwtsw ref.fa.bwameth.c2t [main] Real time: 1.046 sec; CPU: 0.939 sec [vogelw@mus1 example]% bwameth.py --reference ref.fa t_R1.fastq.gz t_R2.fastq.gz -t 12 running: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG ID:t_R SM:t_R' -t 12 ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz' writing to: samtools view -bS - | samtools sort - bwa-meth converting reads in t_R1.fastq.gz,t_R2.fastq.gz [M::main_mem] read 92726 sequences (9365326 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 44942, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (109, 137, 175) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 307) [M::mem_pestat] mean and std.dev: (145.03, 48.64) [M::mem_pestat] low and high boundaries for proper pairs: (1, 373) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 92726 reads in 8.769 CPU sec, 0.748 real sec [main] Version: 0.7.8-r455 [main] CMD: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R @RG ID:t_R SM:t_R -t 12 ref.fa.bwameth.c2t </home/pi/vogelw/bin/python /raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz [main] Real time: 8.309 sec; CPU: 8.933 sec Traceback (most recent call last): File "/home/pi/vogelw/bin/bwameth.py", line 5, in pkg_resources.run_script('bwameth==0.10', 'bwameth.py') File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 528, in run_script File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1394, in run_script File "/raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 582, in main(sys.argv[1:]) File "/raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 579, in main set_as_failed=args.set_as_failed) File "/raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 258, in bwa_mem as_bam(cmd, fa, prefix, calmd, set_as_failed) File "/raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 292, in as_bam pair_list = [Bam(toks) for toks in pair_list] File "/raid1/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 163, in init self.flag = int(self.flag) ValueError: invalid literal for int() with base 10: '0+0k 0+0io 0pf+0w'

brentp commented 10 years ago

will you try again with latest commit. I had another guess. (d0c660c)

vogelwk commented 10 years ago

Brent, To make sure there wasn't anything too badly miss-configured, I created a test environment as follows: w/ gcc 4.8.2 install location: $HOME/opt

Install: python 2.7.6; then install: setuptools-3.4.3, nose-1.3.1, toolshed-0.3.6, git clone: bwa-meth

Here are the results:

% bwameth.py --reference ref.fa t_R1.fastq.gz t_R2.fastq.gz -t 12 running: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG ID:t_R SM:t_R' -t 12 ref.fa.bwameth.c2t '<//home/pi/vogelw/bin/python /raid1/home/pi/vogelw/opt/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz' writing to: samtools view -bS - | samtools sort - bwa-meth converting reads in t_R1.fastq.gz,t_R2.fastq.gz [M::main_mem] read 92726 sequences (9365326 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 44942, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat](25, 50, 75) percentile: (109, 137, 175) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 307) [M::mem_pestat] mean and std.dev: (145.03, 48.64) [M::mem_pestat] low and high boundaries for proper pairs: (1, 373) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 92726 reads in 9.712 CPU sec, 0.850 real sec [main] Version: 0.7.8-r455 [main] CMD: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R @RG ID:t_R SM:t_R -t 12 ref.fa.bwameth.c2t <//home/pi/vogelw/bin/python /raid1/home/pi/vogelw/opt/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz [main] Real time: 8.727 sec; CPU: 10.080 sec Traceback (most recent call last): File "//home/pi/vogelw/bin/bwameth.py", line 5, in pkg_resources.run_script('bwameth==0.10', 'bwameth.py') File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 528, in run_script File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1394, in run_script File "/raid1/home/pi/vogelw/opt/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 581, in main(sys.argv[1:]) File "/raid1/home/pi/vogelw/opt/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 578, in main set_as_failed=args.set_as_failed) File "/raid1/home/pi/vogelw/opt/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 258, in bwa_mem as_bam(cmd, fa, prefix, calmd, set_as_failed) File "/raid1/home/pi/vogelw/opt/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 292, in as_bam pair_list = [Bam(toks) for toks in pair_list] File "/raid1/home/pi/vogelw/opt/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 163, in init self.flag = int(self.flag) ValueError: invalid literal for int() with base 10: '0+0k 10744+0io 5pf+0w'

brentp commented 10 years ago

@vogelwk maybe the recent changes to bwa-meth and toolshed (be sure to update to latest from pypi) will address this.