adamewing / bamsurgeon

tools for adding mutations to existing .bam files, used for testing mutation callers
MIT License
231 stars 86 forks source link

addsnv.py OverflowError again #148

Closed marcogao2019 closed 4 years ago

marcogao2019 commented 4 years ago

Howdy, I got the similar issue with the closed #130 :

INFO 2020-01-09 11:54:49,110 starting /home/.local/bin/addsnv.py called with args: /home/.local/bin/addsnv.py -v test1.txt -f in_BAM/test_bam/01593_chr7.bam -r ref/genome.fa -o out/01593_chr7_mut.bam --picardjar /mnt/c/Users/Documents/work/bin/picard-tools-1.131/picard.jar --aligner mem --seed 1234 --single INFO 2020-01-09 11:54:49,124 created directory: addsnv_logs_01593_chr7_mut.bam INFO 2020-01-09 11:54:49,188 haplo_chr7_140453136_140453136 creating tmp bam: addsnv.tmp/haplo_chr7_140453136_140453136.tmpbam.08b6659b-7aef-422f-b34e-f5eb0f605b3e.bam INFO 2020-01-09 11:54:58,904 haplo_chr7_140453136_140453136 len(readlist): 12 INFO 2020-01-09 11:54:58,905 haplo_chr7_140453136_140453136 selected VAF: 0.100000 WARNING 2020-01-09 11:54:58,905 haplo_chr7_140453136_140453136 forced 3 reads. INFO 2020-01-09 11:54:58,906 haplo_chr7_140453136_140453136 picked: 3 **** ERROR 2020-01-09 11:54:58.906688 encountered error in mutation spikein: ['chr7_140453136_140453136_0.1_T'] Traceback (most recent call last): File "/home/.local/lib/python2.7/site-packages/bamsurgeon-1.2-py2.7.egg/EGG-INFO/scripts/addsnv.py", line 242, in makemut File "pysam/libcalignedsegment.pyx", line 2669, in pysam.libcalignedsegment.AlignedSegment.qual.set File "pysam/libcutils.pyx", line 38, in pysam.libcutils.qualitystring_to_array OverflowError: unsigned byte integer is less than minimum

My pysam version: 0.15.3 Any idea about this issue. Really appreciate it.

adamewing commented 4 years ago

Hi, are you sure it's a valid .bam file? Try running ValidateSamFile from picard to start with.

marcogao2019 commented 4 years ago

Hi, thanks for the reply. The error seems to be still there. Any other suggestions, please? Thanks BTW, test run (test_snv.sh) worked well.

########################### java -jar ../bin/picard-tools-1.131/picard.jar ValidateSamFile I=01593-10-VG_RG.sorted.bam MODE=SUMMARY [Fri Jan 10 10:07:52 EST 2020] picard.sam.ValidateSamFile INPUT=01593-10-VG_RG.sorted.bam MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json [Fri Jan 10 10:07:52 EST 2020] Executing as admin@L-DL-4MD7462 on Linux 4.4.0-18362-Microsoft amd64; OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~16.04.1-b09; Picard version: 1.131(cd60f90fdca902499c70a4472b6162ef37f919ce_1431022382) IntelDeflater No errors found
[Fri Jan 10 10:08:14 EST 2020] picard.sam.ValidateSamFile done. Elapsed time: 0.37 minutes. Runtime.totalMemory()=768081920

###################

addsnv.py -v ../test1.txt -f 01593-10-VG_RG.sorted.bam -r ../ref/genome.fa -o test1_out.bam --picard /mnt/c/Documents/work/bin/picard-tools-1.131/picard.jar --aligner mem --seed 1234 --single INFO 2020-01-10 10:29:55,002 starting /home/.local/bin/addsnv.py called with args: /home/.local/bin/addsnv.py -v ../test1.txt -f 01593-10-VG_RG.sorted.bam -r ../ref/genome.fa -o test1_out.bam --picard /mnt/c/Documents/work/bin/picard-tools-1.131/picard.jar --aligner mem --seed 1234 --single
INFO 2020-01-10 10:29:55,146 haplo_chr7_140453136_140453136 creating tmp bam: addsnv.tmp/haplo_chr7_140453136_140453136.tmpbam.0c97a3d4-5498-4599-b776-7340d97f31d1.bam INFO 2020-01-10 10:30:10,497 haplo_chr7_140453136_140453136 len(readlist): 12 INFO 2020-01-10 10:30:10,498 haplo_chr7_140453136_140453136 selected VAF: 1.000000 INFO 2020-01-10 10:30:10,499 haplo_chr7_140453136_140453136 picked: 12 **** ERROR 2020-01-10 10:30:10.500088 encountered error in mutation spikein: ['chr7_140453136_140453136_1.0_T'] Traceback (most recent call last): File "/home/marcogao/.local/lib/python2.7/site-packages/bamsurgeon-1.2-py2.7.egg/EGG-INFO/scripts/addsnv.py", line 242, in makemut File "pysam/libcalignedsegment.pyx", line 2669, in pysam.libcalignedsegment.AlignedSegment.qual.set File "pysam/libcutils.pyx", line 38, in pysam.libcutils.qualitystring_to_array OverflowError: unsigned byte integer is less than minimum **** ERROR 2020-01-10 10:30:10,519 no succesful mutations

marcogao2019 commented 4 years ago

Hi, did the same thing with #130 : extract fastq and redo alignment then problem solved. not sure why. Thanks

marcogao2019 commented 4 years ago

Hi Adam, Sorry, have to reopen this issue. Here is the error: ################################################# INFO 2020-01-16 09:17:38,712 starting /home/.local/bin/addsnv.py called with args: /home/.local/bin/addsnv.py -v cnvfile/test2_EGFR_2369C2T.txt -f 01593-10-VG_paired_RG_chr7_55349000to55250000.sorted.bam -r /mnt/c/Users/Documents/work/BAM_engineered/ref/genome.fa -o surgeon_out/01593-10-VG_paired_RG_chr7_55349000to55250000_test2_out.bam --picard /mnt/c/Users/Documents/work/bin/picard-tools-1.131/picard.jar -d 0 --aligner mem --seed 1234 --tagreads INFO 2020-01-16 09:17:38,720 created directory: addsnv_logs_01593-10-VG_paired_RG_chr7_55349000to55250000_test2_out.bam INFO 2020-01-16 09:17:38,782 haplo_chr7_55249071_55249071 creating tmp bam: addsnv.tmp/haplo_chr7_55249071_55249071.tmpbam.e3df8cbb-c080-4b0f-88c4-a727c3988314.bam INFO 2020-01-16 09:17:44,399 haplo_chr7_55249071_55249071 len(readlist): 11 INFO 2020-01-16 09:17:44,400 haplo_chr7_55249071_55249071 selected VAF: 1.000000 INFO 2020-01-16 09:17:44,400 haplo_chr7_55249071_55249071 picked: 11 **** ERROR 2020-01-16 09:17:44.401093 encountered error in mutation spikein: ['chr7_55249071_55249071_1.0_T'] Traceback (most recent call last): File "/home/.local/lib/python2.7/site-packages/bamsurgeon-1.2-py2.7.egg/EGG-INFO/scripts/addsnv.py", line 242, in makemut File "pysam/libcalignedsegment.pyx", line 2669, in pysam.libcalignedsegment.AlignedSegment.qual.set File "pysam/libcutils.pyx", line 38, in pysam.libcutils.qualitystring_to_array OverflowError: unsigned byte integer is less than minimum **** ERROR 2020-01-16 09:17:44,405 no succesful mutations ######################################################### I used a very small input BAM segment (hg19: chr7: 55249000-55250000; only 100 paired-end reads). ValidateSamFile showd no error/warning. This is my expected SNV: chr7 55249071 55249071 1.0 T ####################################################### samtools mpileup 01593-10-VG_paired_RG_chr7_55349000to55250000.sorted.bam -r chr7:55249070-55249072 [mpileup] 1 samples in 1 input files chr7 55249070 N 12 AAAAAA$AAAAAA ~~~~ chr7 55249071 N 11 CCCCCCCCCCC ~~~ chr7 55249072 N 11 GGGGGGGGGGG ~~~ ##################### Here is log file. I confirmed these new SNVs on UCSC genome browser.
read A00896:6:HLKKYDMXX:1:1317:25907:33833:AGAAGC+TACTCA,55249005,S CGTGGACAACCCCCACGTGTGCCGCCTGCTGGGCATCTGCCTCACCTCCACCGTGCAGCTCATCATGCAGCT read A00896:6:HLKKYDMXX:1:1101:27299:21840:CTAGAAC+GGCCATA,55249007,S TGGACAACCCCCACGTGTGCCGCCTGCTGGGCATCTGCCTCACCTCCACCGTGCAACTCATCATGCAGCTCATGCCCTTCGGCTGCCTCC
read A00896:6:HLKKYDMXX:1:1123:11505:35415:CCTAAC+TATGGCA,55249016,F CCCACGTGTGCCGCCTGCTGGGCATCTGCCTCACCTCCACCGTGCAGCTCATCATGCAGCTCATGCCC read A00896:6:HLKKYDMXX:1:1177:30644:3239:GACAAC+ACGCAC,55249017,F CCACGTGTGCCGCCTGCTGGGCATCTGCCTCACCTCCACCGTGCAGCTCATCATGCAGCTCATGCCCTTCGG read A00896:6:HLKKYDMXX:1:1102:15664:23735:CCTGTG+GTGGAC,55249023,S GTGCCGCCTGCTGGGCATCTGCCTCACCTCCACCGTGCAACTCATCATGCAGCTCATGCCCTTCGGCTGCCTCCTG read A00896:6:HLKKYDMXX:1:1158:23222:9580:GAAGTG+AAGACA,55249032,F GCTGGGCATCTGCCTCACCTCCACCGTGCAACTCATCATGCAGCTCATGCCCTTCGGCTGCCTCCTG read A00896:6:HLKKYDMXX:1:1166:15763:15546:AAGTCCA+AGAATAG,55249033,S CTGGGCATCTGCCTCACCTCCACCGTGCAGCTCATCATGCAGCTCATGCCCTTCGGCTGCCTCCTGGACTATGTCCGGGAACACAAAGACA read A00896:6:HLKKYDMXX:1:1113:29830:9502:CAGGTC+CCTATTG,55249034,F TGGGCATCTGCCTCACCTCCACCGTGCAACTCATCATGCAGCTCATGCCCTTCGGCTGCCTCCTGGACT read A00896:6:HLKKYDMXX:1:1110:19000:16830:ACGAGCA+AGGATTC,55249041,F CTGCCTCACCTCCACCGTGCAGCTCATCATGCAGCTCATGCCCTTCGGCTGCCTCCTGG read A00896:6:HLKKYDMXX:1:1104:9362:18082:TACCTG+TCAATG,55249047,S CACCTCCACCGTGCAACTCATCATGCAGCTCATGCCCTTCGGCTGCCTCCTGGACTATGTCCGGGAA read A00896:6:HLKKYDMXX:1:1226:14127:17284:GAACCTC+TACATAG,55249057,F GTGCAGCTCATCATGCAGCTCATGCCCTTCGGCTGCCTCCTGGACTAT ###################################### Based on the error, do you think is is becasue of pysam version issue? I also sent an email attached BAM and log files to you yesterday. Could you please help with this issue? Thanks a lot

adamewing commented 4 years ago

Thanks, I'll have a look. How are you generating these .bams i.e. what aligner and command-line arguments?

marcogao2019 commented 4 years ago

Thanks, Adam. Nothing special. Here is it. ################# bwa mem ref.fa r1.fa r2.fa | samtools view -Sb - | samtools sort - >01593-10-VG_paired_RG.sorted.bam samtools view -f 1 -hb 01593-10-VG_paired_RG.sorted.bam "chr7:55249000-55250000" >01593-10-VG_paired_RG_chr7_55349000to55250000.sorted.bam ################## Thanks

marcogao2019 commented 4 years ago

Hi Adam, The above issue got resolved by downgrading pysam from 0.15.3 to 0.14 and re-running python setup.py install. Since bamsurgeon is so version sensitive for both python and pysam, that might be great to add the info in README. Just suggestion. BUT I still got some issue and will open a new one for that. Please take a look. Thanks