arq5x / bedtools

A powerful toolset for genome arithmetic.
http://code.google.com/p/bedtools/
GNU General Public License v2.0
140 stars 85 forks source link

WIP: let subtract bed calculate -f based on all overlapping intervals. #37

Closed brentp closed 12 years ago

brentp commented 12 years ago

starting a pull-request for feedback.

currently, the overlap fraction is per b-interval. this let's it be the overall overlap fraction.

arq5x commented 12 years ago

Sorry for the delay. As I understand it, the goal is to subtract from A if and only if the total overlap from all B hits exceeds the -f threshold with respect to the A interval in question. This is in contrast to the current behavior which tests for the overlap of each B interval individually. Is that right?

If so, would the example below be correct?

> cat a.bed 
chr1    0    10

> cat b.bed
chr1    0    2
chr1    4    6

> bedtools subtract -A -f 0.40 -a a.bed -b bed
chr1    0    10

> bedtools subtract -A -f 0.20 -a a.bed -b bed
(A is removed)

> bedtools subtract -N -f 0.50 -a a.bed -b bed
chr1    0    10

> bedtools subtract -N -f 0.40 -a a.bed -b bed
(A is removed)
brentp commented 12 years ago

yes, that looks correct.

arq5x commented 12 years ago

Great, that seems very useful. Could you add a couple regression tests for the new option?

brentp commented 12 years ago

this should be ready to go in.

arq5x commented 12 years ago

Lovely, thanks Brent!