Genomicus / bedtools

Automatically exported from code.google.com/p/bedtools
0 stars 0 forks source link

slop produces garbage results with negative extension sizes #168

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
-b, -r, -l options accept integer numbers.
In particular if a user wants to trim the regions the slop tools is an obvious 
choice. E.g. run -b 0.25 -pct to get the central part of the region.

I figured out that the following problems might occur in the output
- region beg > region end
- region end < 0

$ echo -e "Zv9_NA105\t12113\t12230\tENSDART00000130536\t0\t+" | bedtools slop 
-l 0 -r -100000 -g /gbdb/danRer7/chromInfo.txt
Zv9_NA105   12113   -87770  ENSDART00000130536  0   +

expected output:
Zv9_NA105   12113   12113   ENSDART00000130536  0   +
or (probably better): removal of this region from the output (since it covers 0 
nucleotides). It might be not obvious which "zero-length interval" should be 
reported if both -l and -r are negative and their sum exceeds the length of the 
interval)

Solutions:
either disallow negative integers entirely (both in help and at runtime enforce 
natural numbers instead of integers)
or treat them properly: never decrease the length of a region to < 0, watch out 
not to produce negative coordinates and coordinates larger than 
chromosome/contig size.

Same applies to bedtools flank and possibly other tools.

Original issue reported on code.google.com by balwi...@gmail.com on 6 Mar 2014 at 6:58