arq5x / bedtools

A powerful toolset for genome arithmetic.
http://code.google.com/p/bedtools/
GNU General Public License v2.0
140 stars 85 forks source link

Parsing BED file #104

Open guillermomarco opened 9 years ago

guillermomarco commented 9 years ago

Hi, I would like to know if pybedtools could be used as a parser for BED files. However I've been looking documentation and I didn't see any reader or writer method. So I guess I'll have to parse tabbed BED file manually, I am right?

Thanks!

daler commented 9 years ago

For reading, simply iterating over a pybedtools.BedTool object parses each line into an Interval object.

def my_func(f):
    f.chrom = "chromosome_" + f.chrom
    return f

for interval in pybedtools.BedTool('a.bed'):
    # do something with interval
    my_func(f)

Here's an idiomatic approach that uses .each() and saves results. That is, read/transform/write all in one line:

pybedtools.BedTool('a.bed').each(my_func).saveas('b.bed')

If your goal is a pandas.DataFrame:

df = x.to_dataframe()
df.head()

#   chrom  start  end      name  score strand
# 0  chr1      1  100  feature1      0      +
# 1  chr1    100  200  feature2      0      +
# 2  chr1    150  500  feature3      0      -
# 3  chr1    900  950  feature4      0      +

Now that I look, there's nowhere obvious in the docs that specifically mentions the reading/writing; instead it's geared toward usage with BEDTools. I'll edit that.

For writing you have several options, see http://pythonhosted.org/pybedtools/save-results.html.

daler commented 9 years ago

By the way, in the future you can use https://github.com/daler/pybedtools/issues directly. I've just created an issue for this, https://github.com/daler/pybedtools/issues/127

guillermomarco commented 9 years ago

Thanks daler, I wass browsing both repos at the same time. My bad sorry.