churchill-lab / g2gtools

Personal diploid genome creation and coordinate conversion
http://churchill-lab.github.io/g2gtools
21 stars 9 forks source link

OverflowError: can't convert negative value to uint32_t #21

Open kingralph80 opened 5 years ago

kingralph80 commented 5 years ago

When trying to convert any bam or sam file, aligned with BWA mem I get the following error. g2gtools seems to work fine until it hits about ~40000 successful conversions.

[g2gtools] Processed 10,000 reads, 9,116 successful, 800 failed [g2gtools] Processed 20,000 reads, 18,257 successful, 1,580 failed [g2gtools] Processed 30,000 reads, 27,322 successful, 2,418 failed [g2gtools] Processed 40,000 reads, 36,378 successful, 3,272 failed

I posted debug output below. Is the -41 the issue?

kingralph80 commented 5 years ago

Debugging shows: [g2gtools debug] ~~~~~~~~~~~~~~~~ [g2gtools debug] Converting A00742:39:HKM32DSXX:4:1662:30481:35806 10 1288218 41H35M [g2gtools debug] PAIRED END ALIGNMENT [g2gtools debug] Chromosome 10, in mapping tree [g2gtools debug] Chromosome 10, in mapping tree [g2gtools debug] CIGAR CONVERSION : 41H35M [g2gtools debug] CIGAR CONVERSION : PHASE 1 : Converting cigar elements [g2gtools debug] Element #1, '41H' specified, location: 1288218 [g2gtools debug] Adding 'H' [g2gtools debug] Current CIGAR: [Cigar(code='H', length=41, start=0, end=0)] [g2gtools debug] Element #2, '35M' specified, location: 1288259 [g2gtools debug] Chromosome 10, in mapping tree [g2gtools debug] Mappings: Easy: IntervalMapping(from_chr='10', from_start=1288259, from_end=1288294, from_seq='G', to_chr='10', to_start=1288680, to_end=1288715, to_seq='.', same_bases='T', vcf_pos='1288346') [g2gtools debug] Current CIGAR: [Cigar(code='H', length=41, start=0, end=0), Cigar(code='M', length=35, start=1288680, end=1288715)] [g2gtools debug] AFTER PHASE 1 : [Cigar(code='H', length=41, start=0, end=0), Cigar(code='M', length=35, start=1288680, end=1288715)] [g2gtools debug] CIGAR CONVERSION : PHASE 2 : Remove S if surrounded by M [g2gtools debug] AFTER PHASE 2 : [Cigar(code='H', length=41, start=0, end=0), Cigar(code='M', length=35, start=1288680, end=1288715)] [g2gtools debug] CIGAR CONVERSION : PHASE 3 : Fix element lengths [g2gtools debug] Removing 0 length elements, if any [g2gtools debug] AFTER PHASE 3 : [Cigar(code='H', length=41, start=0, end=0), Cigar(code='M', length=35, start=1288680, end=1288715)] [g2gtools debug] CIGAR CONVERSION : PHASE 4 : Combining elements [g2gtools debug] 0=Cigar(code='H', length=41, start=0, end=0) [g2gtools debug] 1=Cigar(code='M', length=35, start=1288680, end=1288715) [g2gtools debug] AFTER PHASE 4 : [Cigar(code='H', length=41, start=0, end=0), Cigar(code='M', length=35, start=1288680, end=1288715)] [g2gtools debug] CIGAR CONVERSION : PHASE 5 : Fix pre and post Ms [g2gtools debug] AFTER PHASE 5 : [Cigar(code='S', length=41, start=0, end=0), Cigar(code='M', length=35, start=1288680, end=1288715)] [g2gtools debug] CIGAR CONVERSION : PHASE 6 : Testing length and conversion [g2gtools debug] CIGAR SEQ LENGTH=76 != SEQ_LEN=35 [g2gtools debug] old cigar != new cigar [g2gtools debug] CIGAR CONVERSION : 41H35M ==> [g2gtools debug] [(4, 41), (0, 35), (4, -41)] Traceback (most recent call last): File "/netscratch/dep_psl/grp_frommer/Thomas/bin/miniconda3/envs/g2gtools/bin/g2gtools", line 4, in import('pkg_resources').run_script('g2gtools==0.2.9', 'g2gtools') File "/home/thartwig/.local/lib/python2.7/site-packages/pkg_resources/init.py", line 666, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/thartwig/.local/lib/python2.7/site-packages/pkg_resources/init.py", line 1462, in run_script exec(code, namespace, namespace) File "/netscratch/dep_psl/grp_frommer/Thomas/bin/miniconda3/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.2.9-py2.7.egg/EGG-INFO/scripts/g2gtools", line 132, in G2GToolsApp() File "/netscratch/dep_psl/grp_frommer/Thomas/bin/miniconda3/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.2.9-py2.7.egg/EGG-INFO/scripts/g2gtools", line 99, in init getattr(self, args.command)() File "/netscratch/dep_psl/grp_frommer/Thomas/bin/miniconda3/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.2.9-py2.7.egg/EGG-INFO/scripts/g2gtools", line 102, in convert g2gtools.g2g_commands.command_convert(sys.argv[2:], self.script_name + ' convert') File "/netscratch/dep_psl/grp_frommer/Thomas/bin/miniconda3/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.2.9-py2.7.egg/g2gtools/g2g_commands.py", line 121, in command_convert bsam.convert_bam_file(vci_file=args.vci, file_in=args.input, file_out=args.output, reverse=args.reverse) File "/netscratch/dep_psl/grp_frommer/Thomas/bin/miniconda3/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.2.9-py2.7.egg/g2gtools/bsam.py", line 486, in convert_bam_file alignment_new.cigar = convert_cigar(alignment.cigar, read_chr, vci_file, alignment.seq, read1_strand, alignment.pos) File "pysam/libcalignedsegment.pyx", line 2651, in pysam.libcalignedsegment.AlignedSegment.cigar.set File "pysam/libcalignedsegment.pyx", line 2220, in pysam.libcalignedsegment.AlignedSegment.cigartuples.set OverflowError: can't convert negative value to uint32_t