Ensembl / WiggleTools

Basic operations on the space of numerical functions defined on the genome using lazy evaluators for flexibility and efficiency
Apache License 2.0
143 stars 25 forks source link

bin not yielding all windows #57

Closed gevro closed 3 years ago

gevro commented 3 years ago

Hi, There's some issue with bin. It is not producing bins for regions without data in the wiggle file, even after doing fillIn. Instead it is just making a large bin with score = 0. But I want 100 bp bins regardless of whether there is data. And fillIn should work.

$ head -n 20 blah.windows.bg
chr1    0   100 0
chr1    100 200 0
chr1    200 300 0
chr1    300 400 0
chr1    400 500 0
chr1    500 600 0
chr1    600 700 0
chr1    700 800 0
chr1    800 900 0
chr1    900 1000    0
chr1    1000    1100    0
chr1    1100    1200    0
chr1    1200    1300    0
chr1    1300    1400    0
chr1    1400    1500    0
chr1    1500    1600    0
chr1    1600    1700    0
chr1    1700    1800    0
chr1    1800    1900    0
chr1    1900    2000    0
$ wiggletools write_bg - bin 100 scale 0.01 fillIn blah.windows.bg k24.umap.sorted.bw | head
chr1    0   100 0.137970
chr1    100 200 0.702080
chr1    200 300 0.210000
chr1    300 1800    0.000000
chr1    1800    1900    0.079950
chr1    1900    2300    0.000000
chr1    2300    2400    0.149580
chr1    2400    2500    0.180420
chr1    2500    2600    0.220080
chr1    2600    2800    0.000000

As you can see, there is an entry "chr1 300 1800 0.000000". But the expected behavior should be as below, or at least an option to do that. chr1 300 400 0.0000 chr1 400 500 0.0000 chr1 500 600 0.0000 ...

dzerbino commented 3 years ago

Hello @gevro ,

This is because the bedgraph writer was trying to be too clever, and shared a compression module with the wiggle writer.

I have now deactivated this behaviour.

Cheers,

Daniel