arq5x / bedtools

A powerful toolset for genome arithmetic.
http://code.google.com/p/bedtools/
GNU General Public License v2.0
140 stars 85 forks source link

ENH: shuffle within a window or distance #74

Open brentp opened 11 years ago

brentp commented 11 years ago

The significance testing used by ENCODE as developed by Bickel et al (http://arxiv.org/pdf/1101.0947.pdf) and available here: http://www.encodestatistics.org/ (IIUC) seems to rely on shuffling within prescribed regions. Part of their method is defining those regions by segmentation, but BEDTools, could ease this type of investigation by allowing one to specify that intervals are only shuffled within the window from which they arise or that they are only shuffled to within a given distance.

The latter should be quite simple to implement, though more difficult within the current framework of bedtools shuffle.

An interface could be like:

bedtools window_shuffle -w windows.bed -i input.bed -g hg18.genome > local-shuffle.bed

with -w exclusive with -d for distance:

bedtools window_shuffle -d 50000 -i input.bed -g hg19.genome > local-shuffle.bed