Ensembl / WiggleTools

Basic operations on the space of numerical functions defined on the genome using lazy evaluators for flexibility and efficiency
Apache License 2.0
143 stars 25 forks source link

Output data from specific region #86

Closed gevro closed 1 year ago

gevro commented 1 year ago

Hi, How do I output data only from a specific region (chrom:start-end)?

Here is my base command:

wiggletools write_bg - compress gte 0.2 test.bw | less

Thanks

gevro commented 1 year ago

Is this the way to do it? wiggletools seek chr1 1 1000000 write_bg - compress gte 0.2 test.bw

dzerbino commented 1 year ago

Hello @gevro , absolutely, that's correct!

gevro commented 1 year ago

What if I want all positions on chr1 without having to specify end position? This does not work: wiggletools seek chr1 write_bg - compress gte 0.2 test.bw

And is there a way to provide a file with a list of regions? If so, what format is this file (BED, 1-based, etc)?

dzerbino commented 1 year ago

Hello @gevro ,

Sorry, pulling out a whole chromosome was not an implemented feature.

You may however want to look at the overlaps function, which you provide with regions and is used to filter another iterator.

Hope this helps,

Daniel

gevro commented 1 year ago

Thanks. I don't see the documentation for it - how do I use it, and what format is the regions file input? 0-based or 1-based? BED format or UCSC format?

gevro commented 1 year ago

Also, wouldn't trim also work instead of overlaps, if I give a BED file with coordinates of the whole chromosome?

dzerbino commented 1 year ago

Yes, trim would work too, just with different rules with respect to boundaries.

gevro commented 1 year ago

Does trim intelligently seek specific coordinates based on the BED file?

It seems like it is not. For example, if I give trim a BED file with chr1 coordinates, it quickly displays the output. But if the trim BED file begins from chr2, it takes a long time, indicating it is sequentially going through the bigwig file until it reaches chr2.

Isn't there a way to seek directly to the desired coordinates?

gevro commented 1 year ago

It looks like overlaps doesn't do it either, only seek does.

But the problem is that seek doesn't take a BED file.

Is there any way to get rapid seek but with a BED file input? Or can this be added?

dzerbino commented 1 year ago

Hello @gevro ,

that's an interesting idea but alas I don't think it will be implemented shortly, as Wiggletools is under maintenance only.

Best regards,

Daniel

gevro commented 1 year ago

Thanks. My solution is to do seek iteratively on each region and concatenate the outputs to one file with '>>' each time.