daler / pybedtools

Python wrapper -- and more -- for BEDTools (bioinformatics tools for "genome arithmetic")
http://daler.github.io/pybedtools
Other
297 stars 103 forks source link

IndexError with groupby #378

Closed wbvguo closed 1 year ago

wbvguo commented 1 year ago

Hi,

I tried to run the pybedtools.bedtool.Bedtool.groupby method

a = BedTool("xxx/pybedtools.bed")
b = a.groupby(g = 1, c=5, o =['sum'])
print(b)

but encounter the following error:

~/apps/anaconda3/lib/python3.8/site-packages/pybedtools/bedtool.py in __str__(self)
   1228         """
   1229         items = []
-> 1230         for i in iter(self):
   1231             i = str(i)
   1232             if isinstance(i, bytes):

~/apps/anaconda3/lib/python3.8/site-packages/pybedtools/cbedtools.pyx in pybedtools.cbedtools.IntervalIterator.__next__()

~/apps/anaconda3/lib/python3.8/site-packages/pybedtools/cbedtools.pyx in pybedtools.cbedtools.create_interval_from_list()

IndexError: list index out of range

here is the test data I used, hope this issue can be fixed soon

Best,

daler commented 1 year ago

The expected output of the command

bedtools groupby -i pybedtools.bed -g 1 -c 5 -o sum

would be:

chr1     11

Which is not in any genomics data format (e.g., a minimal BED needs at least chrom, start, stop).

In this example you can save a temp file from the output, which will then be available as b.fn, and then work with it any way you would otherwise work with TSV files. E.g.,

import pybedtools
a = pybedtools.BedTool("xxx/pybedtools.bed")
b = a.groupby(g = 1, c=5, o =['sum']).saveas()
import pandas
df = pandas.read_table(b.fn, names=['chrom', 'sum'])