ccgd-profile / BreaKmer

A method to identify structural variation from sequencing data in target regions
31 stars 11 forks source link

ValueError: start out of range (-199) #19

Closed bcolb closed 8 years ago

bcolb commented 8 years ago

I'm getting a ValueError for targeted genes with indices less than 200. I believe this is due to the regionBuffer of 200bp that is used around each targeted region.

Here is the stack trace (with my local path removed):

Traceback (most recent call last): File "/PATH/breakmer/breakmer.py", line 72, in RUN_TRACKER.run() File "/PATH/breakmer/breakmer/processor/analysis.py", line 143, in run aggResults = analyze_targets(targetAnalysisList) File "/PATH/breakmer/breakmer/processor/analysis.py", line 79, in analyze_targets if not targetRegion.find_sv_reads(): # No SV reads extracted. Exiting. File "/PATH/breakmer/breakmer/processor/target.py", line 552, in find_sv_reads self.extract_bam_reads('sv') # Extract variant reads. File "/PATH/breakmer/breakmer/processor/target.py", line 578, in extract_bam_reads self.variation.set_var_reads(sampleType, bamFile, self.chrom, self.start, self.end, self.regionBuffer) File "/PATH/breakmer/breakmer/processor/target.py", line 145, in set_var_reads self.var_reads[sampleType] = bam_handler.get_variant_reads(bamFile, chrom, start - regionBuffer, end - regionBuffer, self.params.get_param('insertsize_thresh')) File "/PATH/breakmer/breakmer/processor/bam_handler.py", line 234, in get_variant_reads reads, bamF = get_region_reads(bamFile, chrom, start, end) File "/PATH/breakmer/breakmer/processor/bam_handler.py", line 213, in get_region_reads reads = bamF.fetch(chrom, start, end) File "csamtools.pyx", line 1059, in pysam.csamtools.Samfile.fetch (pysam/csamtools.c:12490) File "csamtools.pyx", line 992, in pysam.csamtools.Samfile._parseRegion (pysam/csamtools.c:11769) ValueError: start out of range (-199)

I'm guessing that it just needs a check somewhere for edge cases that could result in negative values.

mducar commented 8 years ago

Hi Brice,

It looks like you discovered a bug. Not sure when we can have a fix pushed out, but I can suggest a work-around.

Are you running BreaKmer on human samples? If so, I'm curious why you have intervals starting at the first chromosome base (which are typically N's).

What causes this error is BreaKmer expands the search space beyond the intervals you provide to add in a bit of a buffer. One option is to redefine your intervals to start at position 201 rather than 1. I haven't tested this, but it is worth trying.

bcolb commented 8 years ago

Hi Matt,

These are human samples. It turns out that the regions that were causing this bug are part of our capture but are not relevant in our search for SVs. I am simply removing them from my bed file which should solve the problem.

Also, I did initially modify my bed file to start at position 201 as per your suggestion which worked in avoiding the value error.

Thanks for the response and the help!