Ensembl / WiggleTools

Basic operations on the space of numerical functions defined on the genome using lazy evaluators for flexibility and efficiency
Apache License 2.0
143 stars 25 forks source link

Output specific regions #75

Closed gevro closed 2 years ago

gevro commented 2 years ago

Hi, Is it possible to output/perform commands on specific regions specified on command line (e.g. specific contig), or list of regions provided in a BED file?

dzerbino commented 2 years ago

Hello @gevro,

Indeed, in fact there are two ways to do so depending on what you want to exactly.

If you simply want to restrict the output of an iterator to a given set of regions, you can multiply it to the output of a BED iterator, which itself returns 0/1 depending on the presence of a block or not. This would look like:

mult <signal> regions.bed

However, I'm guessing what you want to do is perform an operation across each region, producing a statistic for that region. For this you should look at the apply and apply_paste function in the README. If for example you wish to compute the mean coverage of a set of genes, it would look like:

apply meanI genes.bed <signal>

In particular, the apply_paste function allows you to print out the results alongside the original BED file.

The functions you can apply to regions are listed under Statistics in the README.

There are ways to input regions via the command line (using for example the seek function or by streaming data in BedGraph with the - operator), but in practice it is often simpler to provide a BED file.

Hope this helps,

Daniel