daler / pybedtools

Python wrapper -- and more -- for BEDTools (bioinformatics tools for "genome arithmetic")
http://daler.github.io/pybedtools
Other
298 stars 102 forks source link

window_maker seems to not obey the -s (step size) option. #81

Closed arq5x closed 6 years ago

arq5x commented 11 years ago

At least with the version I have installed (0.6.2dev), it looks like the window_maker() wrapper does not wrap the -s option. Are you able to reproduce this?

>>> import pybedtools as pbt
>>> pbt.__version__
'0.6.2dev'
 >>> pbt.BedTool().window_maker(genome='hg19', w=1000000, s=500000).head()
 chr1   0   1000000
 chr1   1000000 2000000
 chr1   2000000 3000000
 chr1   3000000 4000000
 chr1   4000000 5000000
 chr1   5000000 6000000
 chr1   6000000 7000000
 chr1   7000000 8000000
 chr1   8000000 9000000
 chr1   9000000 10000000

>>> pbt.BedTool().window_maker(genome='hg19', w=1000000, s=0).head()
chr1    0   1000000
 chr1   1000000 2000000
 chr1   2000000 3000000
 chr1   3000000 4000000
 chr1   4000000 5000000
 chr1   5000000 6000000
 chr1   6000000 7000000
 chr1   7000000 8000000
 chr1   8000000 9000000
 chr1   9000000 10000000
daler commented 11 years ago

Yep, I can reproduce . . . but it looks to be caused by BEDTools requiring -w to come first:


$ bedtools --version
bedtools v2.17.0-90-gf4633e9

$ bedtools makewindows -s 500000 -w 1000000 -g hg19 | head
chr1    0   1000000
chr1    1000000 2000000
chr1    2000000 3000000
chr1    3000000 4000000
chr1    4000000 5000000
chr1    5000000 6000000
chr1    6000000 7000000
chr1    7000000 8000000
chr1    8000000 9000000
chr1    9000000 10000000

$ bedtools makewindows -s 0 -w 1000000 -g hg19 | head
chr1    0   1000000
chr1    1000000 2000000
chr1    2000000 3000000
chr1    3000000 4000000
chr1    4000000 5000000
chr1    5000000 6000000
chr1    6000000 7000000
chr1    7000000 8000000
chr1    8000000 9000000
chr1    9000000 10000000

# When -w comes before -s it works
$ bedtools makewindows -w 1000000 -s 500000 -g hg19 | head
chr1    0   1000000
chr1    500000  1500000
chr1    1000000 2000000
chr1    1500000 2500000
chr1    2000000 3000000
chr1    2500000 3500000
chr1    3000000 4000000
chr1    3500000 4500000
chr1    4000000 5000000
chr1    4500000 5500000

Tag! You're it! :)

arq5x commented 11 years ago

Doh. Thanks Ryan, will fix her up!

ghost commented 10 years ago

i think i have the latest versions of pybedtools (0.6.6) and bedtools 2.20 installed but still see this issue.. Is there any work around, exept running bedtools..

cheers

Andrew

JocelynSP commented 6 years ago

I am getting the same error:

>>> pybedtools.__version__
'0.6.6'
>>> bwins = pybedtools.BedTool().window_maker(w=20, s=5, i="srcwinnum", b=b)
>>> print bwins
chr1    155 175 feature5_1
chr1    175 195 feature5_2
chr1    195 200 feature5_3
chr1    800 820 feature6_1
chr1    820 840 feature6_2

but I get sliding windows with bedtools at the command line, provided I put -w before -s:

% bedtools --version
bedtools v2.19.1
% bedtools makewindows -w 100 -s 50 -b flankseqs/pre500_778.fa.bed | head
chr1    170849219   170849319
chr1    170849269   170849369
chr1    170849319   170849419
chr1    170849369   170849469

(It still ignores -s if it is before -w)

Regards, Jocelyn

JocelynSP commented 6 years ago

I think it might be connected to Issue #101 . Defaulting to the BedTool as b is also not working for me. E.g.:

>>> bwinsImplicit = b.window_maker(w=20, s=5, i="srcwinnum")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/bioinfsoftware/python/current/lib/python2.7/site-packages/pybedtools-0.6.6-py2.7-linux-x86_64.egg/pybedtools/bedtool.py", line 664, in decorated
    result = method(self, *args, **kwargs)
  File "/usr/local/bioinfsoftware/python/current/lib/python2.7/site-packages/pybedtools-0.6.6-py2.7-linux-x86_64.egg/pybedtools/bedtool.py", line 227, in wrapped
    kwargs = self.check_genome(**kwargs)
  File "/usr/local/bioinfsoftware/python/current/lib/python2.7/site-packages/pybedtools-0.6.6-py2.7-linux-x86_64.egg/pybedtools/bedtool.py", line 1246, in check_genome
    raise ValueError('No genome specified. Use the "g" or '
ValueError: No genome specified. Use the "g" or "genome" kwargs, or use the .set_chromsizes() method

Adding chromsizes to b prevented the error, but the windows are based on the chromsizes, not the intervals, and still ignore s:

>>> hg19 = pybedtools.chromsizes('hg19')
>>> b=b.set_chromsizes(hg19)
>>> bwinsImplicit = b.window_maker(w=200000, s=50000, i="srcwinnum")
>>> bwinsImplicit.head()
chr1    0   200000  chr1_1
 chr1   200000  400000  chr1_2
 chr1   400000  600000  chr1_3
 chr1   600000  800000  chr1_4

Note windows are for hg19 genome, not b.bed

Adding the chromsizes to b had no effect when b was supplied:

>>> bwins_hg19size = b.window_maker(w=20, s=5, i="srcwinnum", b=b)
>>> bwins_hg19size.head()
chr1    155 175 feature5_1
 chr1   175 195 feature5_2
 chr1   195 200 feature5_3
 chr1   800 820 feature6_1
 chr1   820 840 feature6_2
 chr1   840 860 feature6_3
daler commented 6 years ago

@JocelynSP are you tied to that version of bedtools? There have been substantial changes in BEDTools in the three years since v2.19.1 was released, including the fix to this issue. I'm not sure there's much I can do on my end in pybedtools.

The following works on BEDTools v2.26, for example:

import pybedtools
b = pybedtools.example_bedtool('b.bed')
print(pybedtools.BedTool().window_maker(w=20, s=5, b=b, i='srcwinnum'))
chr1    155     175     feature5_1
chr1    160     180     feature5_2
chr1    165     185     feature5_3
chr1    170     190     feature5_4
chr1    175     195     feature5_5
chr1    180     200     feature5_6
chr1    185     200     feature5_7
chr1    190     200     feature5_8
chr1    195     200     feature5_9
chr1    800     820     feature6_1
chr1    805     825     feature6_2
chr1    810     830     feature6_3
chr1    815     835     feature6_4
chr1    820     840     feature6_5
chr1    825     845     feature6_6
chr1    830     850     feature6_7
chr1    835     855     feature6_8
chr1    840     860     feature6_9
chr1    845     865     feature6_10
chr1    850     870     feature6_11
chr1    855     875     feature6_12
chr1    860     880     feature6_13
chr1    865     885     feature6_14
chr1    870     890     feature6_15
chr1    875     895     feature6_16
chr1    880     900     feature6_17
chr1    885     901     feature6_18
chr1    890     901     feature6_19
chr1    895     901     feature6_20
chr1    900     901     feature6_21
JocelynSP commented 6 years ago

I had version 2.26.0 available to load, and yes, it fixes the problem. Thanks!

Leaving out "b=" still gives ValueError: No genome specified. Use the "g" or "genome" kwargs, or use the .set_chromsizes() method but that is completely unimportant

daler commented 6 years ago

OK, good to hear. The semantics of bedtools makewindows is such that either -g or -b are required. In pybedtools I can only choose one to be the default.

By the way you might want to update pybedtools as well ;)