daler / pybedtools

Python wrapper -- and more -- for BEDTools (bioinformatics tools for "genome arithmetic")
http://daler.github.io/pybedtools
Other
302 stars 102 forks source link

intersect with multiple -b arguments not working with -sorted #397

Closed eboileau closed 10 months ago

eboileau commented 10 months ago

Description

Calling intersect with multiple -b arguments does not work with the -sorted flag, although each -b argument is sorted. It looks like pybedtools somehow concats all -b arguments into a single file for processing.

To Reproduce

import pybedtools

a = pybedtools.example_bedtool('a.bed').sort()
a.fn
'/tmp/pybedtools.kwxbaves.tmp'
a.head()
chr1   1       100     feature1        0       +
chr1   100     200     feature2        0       +
chr1   150     500     feature3        0       -
chr1   900     950     feature4        0       +

b = pybedtools.example_bedtool('b.bed').sort()
b.fn
'/tmp/pybedtools.8pj4ztjw.tmp'
b.head()
chr1   155     200     feature5        0       -
chr1   800     901     feature6        0       +

records = [("chr1", 125, 126, "feature7", 0, "+"), ("chr1", 800, 801, "feature8", 0, "+")]
c = pybedtools.BedTool(records).sort()
c.fn
'/tmp/pybedtools.l_aqdd9f.tmp'
c.head()
chr1   125     126     feature7        0       +
chr1   800     801     feature8        0       +

d = a.intersect([b,c], s=True, sorted=True)

This results in

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File ".../lib/python3.10/site-packages/pybedtools/bedtool.py", line 909, in decorated
    result = method(self, *args, **kwargs)
  File ".../lib/python3.10/site-packages/pybedtools/bedtool.py", line 388, in wrapped
    stream = call_bedtools(
  File ".../lib/python3.10/site-packages/pybedtools/helpers.py", line 460, in call_bedtools
    raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError: 
Command was:

        bedtools intersect -a /tmp/pybedtools.kwxbaves.tmp -sorted -s -b /tmp/pybedtools.uc8mcuy5.tmp

Error message was:
Error: Sorted input specified, but the file /tmp/pybedtools.uc8mcuy5.tmp has the following out of order record
chr1    125     126     feature7        0       +

bedtools works fine

bedtools intersect -s -sorted -a /tmp/pybedtools.kwxbaves.tmp -b /tmp/pybedtools.8pj4ztjw.tmp /tmp/pybedtools.l_aqdd9f.tmp
chr1    100     200     feature2        0       +       2       chr1    125     126     feature7        0       +
chr1    150     500     feature3        0       -       1       chr1    155     200     feature5        0       -
chr1    900     950     feature4        0       +       1       chr1    800     901     feature6        0       +

while indeed the one file (created by pybedtool) is not sorted

cat /tmp/pybedtools.uc8mcuy5.tmp
chr1    155     200     feature5        0       -
chr1    800     901     feature6        0       +
chr1    125     126     feature7        0       +
chr1    800     801     feature8        0       +

Environment

bedtools v2.29.1 pybedtools.version '0.9.1'

eboileau commented 10 months ago

Ah... you have to pass -b arguments as follows

d = a.intersect(b=[b.fn,c.fn], wa=True, wb=True, s=True, sorted=True)

then it works!

I think this is missing in the docs, but indeed follows the same logic as multi_intersect x.multi_intersect(i=[a.fn, b.fn]).