daler / pybedtools

Python wrapper -- and more -- for BEDTools (bioinformatics tools for "genome arithmetic")
http://daler.github.io/pybedtools
Other
302 stars 102 forks source link

intersect threshold #335

Closed berkuva closed 3 years ago

berkuva commented 3 years ago

When doing an intersect operation between two Bedtool objects, is there a way to change the number of base pairs needed to match? For example, [1, 10] and [10, 20] match by one. Can I set a number, say 3, so that only regions that match by 3 or more are considered to intersect?

daler commented 3 years ago

Yes, since pybedtools wrap bedtools, you can take advantage of the bedtools functionality to do this.

One option is to use the -f or -F arguments, which specify the fraction of features in file A or B respectively.

Alternatively you could report out the overlap with -wao and filter accordingly, like this:

import pybedtools
a = pybedtools.example_bedtool('a.bed')
b = pybedtools.example_bedtool('b.bed')
SIZE = 2
print(a.intersect(b, wao=True).filter(lambda x: int(x[-1]) > SIZE).saveas())
chr1    100     200     feature2        0       +       chr1    155     200     feature5        0       -       45
chr1    150     500     feature3        0       -       chr1    155     200     feature5        0       -       45