Closed gevro closed 3 years ago
I think there is actually a bug.
In the below blah.bw, from position 35 to position 49, the value is 0.25:
$ wiggletools write_bg - blah.bw | head -n 10
chr1 0 35 0.000000
chr1 35 141 0.250000
chr1 141 170 0.500000
chr1 170 183 0.250000
chr1 183 342 0.000000
chr1 342 343 0.145834
chr1 343 344 0.129032
chr1 344 345 0.135135
chr1 345 346 0.142857
chr1 346 347 0.145834
Here it is with fillIn:
$ wiggletools write_bg - fillIn genome.bed blah.bw | head -n 10
chr1 0 35 0.000000
chr1 35 141 0.250000
chr1 141 170 0.500000
chr1 170 183 0.250000
chr1 183 342 0.000000
chr1 342 343 0.145834
chr1 343 344 0.129032
chr1 344 345 0.135135
chr1 345 346 0.142857
chr1 346 347 0.145834
But when I do bin 50, it starts from chr1 position 49, and wrongly excludes position 35 - 49:
$ wiggletools write_bg - bin 50 blah.bw | head -n 10
chr1 49 99 12.500000
chr1 99 149 14.500000
chr1 149 199 13.750000
chr1 199 249 0.000000
chr1 249 299 0.000000
chr1 299 349 1.013238
chr1 349 399 10.488659
chr1 399 449 12.070403
chr1 449 499 10.081707
chr1 499 549 9.438597
And regardless of where I put fillIn, it doesn't help:
$ wiggletools write_bg - bin 50 fillIn genome.bed blah.bw | head -n 10
chr1 49 99 12.500000
chr1 99 149 14.500000
chr1 149 199 13.750000
chr1 199 249 0.000000
chr1 249 299 0.000000
chr1 299 349 1.013238
chr1 349 399 10.488659
chr1 399 449 12.070403
chr1 449 499 10.081707
chr1 499 549 9.438597
$ wiggletools write_bg - fillIn genome.bed bin 50 blah.bw | head -n 10
chr1 0 49 0.000000
chr1 49 99 12.500000
chr1 99 149 14.500000
chr1 149 199 13.750000
chr1 199 249 0.000000
chr1 249 299 0.000000
chr1 299 349 1.013238
chr1 349 399 10.488659
chr1 399 449 12.070403
chr1 449 499 10.081707
$ wiggletools write_bg - fillIn genome.bed bin 50 fillIn genome.bed blah.bw | head -n 10
chr1 0 49 0.000000
chr1 49 99 12.500000
chr1 99 149 14.500000
chr1 149 199 13.750000
chr1 199 249 0.000000
chr1 249 299 0.000000
chr1 299 349 1.013238
chr1 349 399 10.488659
chr1 399 449 12.070403
chr1 449 499 10.081707
Perhaps it is just an edge case bug for the first bin.
Another strange example from a different bigwig file:
$ wiggletools write_bg - blah2.bw | head -n 100
chr1 0 1 0.000000
chr1 1 2 0.000000
chr1 2 3 0.000000
chr1 3 4 0.000000
chr1 4 5 0.000000
chr1 5 6 0.000000
chr1 6 7 0.000000
chr1 7 8 0.000000
chr1 8 9 0.000000
chr1 9 10 0.000000
chr1 10 11 0.000000
chr1 11 12 0.000000
chr1 12 13 0.000000
chr1 13 14 0.000000
chr1 14 15 0.000000
chr1 15 16 0.000000
chr1 16 17 0.000000
chr1 17 18 0.000000
chr1 18 19 0.000000
chr1 19 20 0.000000
chr1 20 21 0.000000
chr1 21 22 0.000000
chr1 22 23 0.000000
chr1 23 24 0.000000
chr1 24 25 0.000000
chr1 25 26 0.000000
chr1 26 27 0.000000
chr1 27 28 0.000000
chr1 28 29 0.000000
chr1 29 30 0.000000
chr1 30 31 0.000000
chr1 31 32 0.000000
chr1 32 33 0.000000
chr1 33 34 0.000000
chr1 34 35 0.000000
chr1 35 36 0.500000
chr1 36 37 0.500000
chr1 37 38 0.500000
chr1 38 39 0.500000
chr1 39 40 0.500000
chr1 40 41 0.500000
chr1 41 42 0.500000
chr1 42 43 0.500000
chr1 43 44 0.500000
chr1 44 45 0.500000
chr1 45 46 0.500000
chr1 46 47 0.500000
chr1 47 48 0.500000
chr1 48 49 0.500000
chr1 49 50 0.500000
chr1 50 51 0.500000
chr1 51 52 0.500000
chr1 52 53 0.500000
chr1 53 54 0.500000
chr1 54 55 0.500000
chr1 55 56 0.500000
chr1 56 57 0.500000
chr1 57 58 0.500000
chr1 58 59 0.500000
chr1 59 60 0.500000
chr1 60 61 0.500000
chr1 61 62 0.500000
chr1 62 63 0.500000
chr1 63 64 0.500000
chr1 64 65 0.500000
chr1 65 66 0.500000
chr1 66 67 0.500000
chr1 67 68 0.500000
chr1 68 69 0.500000
chr1 69 70 0.500000
chr1 70 71 0.500000
chr1 71 72 0.500000
chr1 72 73 0.500000
chr1 73 74 0.500000
chr1 74 75 0.500000
chr1 75 76 0.500000
chr1 76 77 0.500000
chr1 77 78 0.500000
chr1 78 79 0.500000
chr1 79 80 0.500000
chr1 80 81 0.500000
chr1 81 82 0.500000
chr1 82 83 0.500000
chr1 83 84 0.500000
chr1 84 85 0.500000
chr1 85 86 0.500000
chr1 86 87 0.500000
chr1 87 88 0.500000
chr1 88 89 0.500000
chr1 89 90 0.500000
chr1 90 91 0.500000
chr1 91 92 0.500000
chr1 92 93 0.500000
chr1 93 94 0.500000
chr1 94 95 0.500000
chr1 95 96 0.500000
chr1 96 97 0.500000
chr1 97 98 0.500000
chr1 98 99 0.500000
chr1 99 100 0.500000
$ wiggletools bin 50 blah2.bw | head
chr1 49 99 -20.500000
chr1 99 149 29.000000
chr1 149 199 27.500000
chr1 199 299 0.000000
chr1 299 349 2.026474
chr1 349 399 20.977322
chr1 399 449 24.140812
chr1 449 499 20.163421
chr1 499 549 18.877193
chr1 549 599 20.048252
Negative value in the first bin?
One more strange example in case this helps find the bug:
$ wiggletools write_bg - blah3.bw | head
chr1 76 114 0.108700
chr1 183 184 0.108700
chr1 184 204 0.326100
chr1 204 218 0.217400
chr1 255 256 0.217400
chr1 256 298 0.326100
$ wiggletools bin 20 blah3.bw | head
chr1 19 59 0.000000
chr1 59 79 0.326100
chr1 79 99 2.174000
chr1 99 119 1.630500
chr1 119 179 0.000000
chr1 179 199 5.000200
chr1 199 219 4.674100
The first bin width isn't even the correct size, and it's not clear why the bins start where they do. Seems arbitrary.
$ wiggletools write_bg - bin 20 blah3.bw | head
chr1 19 39 0.000000
chr1 39 59 0.000000
chr1 59 79 0.326100
chr1 79 99 2.174000
chr1 99 119 1.630500
chr1 119 139 0.000000
chr1 139 159 0.000000
chr1 159 179 0.000000
chr1 179 199 5.000200
Based on this, it seems that there might be two things going on: a bug with the first bin, and also the bin iterator seems to be using the compression module, but it shouldn't be.
And I think there is a 3rd bug in bin. After chr1, it really gets weird:
$ wiggletools bin 50 blah3.bw | grep -v "^chr1"
chr2 248387999 248388049 -1433382744130566.750000
chr20 248388049 248388099 -1016193259775819.375000
chr21 248388099 248388149 -4707426071032891.000000
chr22 248388149 248388199 -4439755725746698.000000
chr3 248388199 248388249 -1204123472334506.250000
chr4 248388249 248388299 -1603404158277709.500000
chr5 248388299 248388349 -2109468357354108.250000
chr6 248388349 248388399 -880544406277438.125000
chr7 248388399 248388449 -3713302668858378.000000
chr8 248388449 248388499 -1212117984988667.750000
chr9 248388499 248388549 -4011027180602880.500000
chrM 248388549 248388599 -32047372042.806614
chrX 248388599 248388649 -657718690766102.875000
But there is clearly normal data in chr2, etc...
$ wiggletools write_bg - blah3.bw | grep -v "^chr1" | head
chr2 0 1 0.108700
chr2 1 2 0.217400
chr2 2 5 0.434800
chr2 5 6 0.543500
chr2 6 12 0.760900
chr2 12 38 0.978300
chr2 38 42 0.217400
chr2 42 47 0.108700
chr2 112 117 0.108700
chr2 117 152 0.217400
Hello @gevro ,
To answer your question, bin does actually fill in missing values. WiggleTools has an implicit notion of default value for iterators which is transmitted by composition (e.g. files have a default of 0, but say the offset operator would accordingly offset the default value).
The bugs you pointed out were largely due to some misguided alterations I made the past few days, which I reverted.
Please pull again from master.
Cheers,
Daniel
The main bug is fixed, but 'bin' is still not quite behaving properly. It looks like 'bin' output is still using a compression module. See below the first bin. Bin 0 to 160 is output as one bin, even though I specified bin 20. It should output 0 to 20, 20 to 40, etc. fillIn doesn't help.
$ wiggletools write_bg - blah3.bw | head
chr1 0 170 0.000000
chr1 170 183 0.250000
chr1 183 260 0.500000
chr1 260 266 1.000000
chr1 266 272 0.750000
chr1 272 304 0.833333
chr1 304 313 0.750000
chr1 313 315 0.687500
chr1 315 319 0.637500
chr1 319 323 0.687500
evrong01-i27-02 reapr$ wiggletools bin 20 blah3.bw | head
chr1 0 160 0.000000
chr1 160 180 2.500000
chr1 180 200 9.250000
chr1 200 260 10.000000
chr1 260 280 17.166664
chr1 280 300 16.666660
chr1 300 320 14.695832
chr1 320 340 14.314493
chr1 340 360 11.618706
chr1 360 380 9.329554
evrong01-i27-02 reapr$ wiggletools fillIn genome.bed bin 20 blah3.bw | head
chr1 0 160 0.000000
chr1 160 180 2.500000
chr1 180 200 9.250000
chr1 200 260 10.000000
chr1 260 280 17.166664
chr1 280 300 16.666660
chr1 300 320 14.695832
chr1 320 340 14.314493
chr1 340 360 11.618706
chr1 360 380 9.329554
evrong01-i27-02 reapr$ wiggletools bin 20 fillIn genome.bed blah3.bw | head
chr1 0 160 0.000000
chr1 160 180 2.500000
chr1 180 200 9.250000
chr1 200 260 10.000000
chr1 260 280 17.166664
chr1 280 300 16.666660
chr1 300 320 14.695832
chr1 320 340 14.314493
chr1 340 360 11.618706
chr1 360 380 9.329554
A work-around is to use write_bg, with or without fillIn, but regardless, bin should output equal sized-bins.
$ wiggletools write_bg - bin 20 fillIn genome.bed blah3.bw | head
chr1 0 20 0.000000
chr1 20 40 0.000000
chr1 40 60 0.000000
chr1 60 80 0.000000
chr1 80 100 0.000000
chr1 100 120 0.000000
chr1 120 140 0.000000
chr1 140 160 0.000000
chr1 160 180 2.500000
chr1 180 200 9.250000
$ wiggletools write_bg - bin 20 blah3.bw | head
chr1 0 20 0.000000
chr1 20 40 0.000000
chr1 40 60 0.000000
chr1 60 80 0.000000
chr1 80 100 0.000000
chr1 100 120 0.000000
chr1 120 140 0.000000
chr1 140 160 0.000000
chr1 160 180 2.500000
chr1 180 200 9.250000
Hello @gevro ,
I've been thinking about this, and I'm not too sure what the right behaviour should be.
What you're observing is that by default wiggletools prints out results in wiggle format, which confusingly can be composed of bedgraph blocks (as in here). Now the wiggle file format is not meant to be easy to read but to be compact, so by default the output is compressed.
As you point out, with the write_bg command, the output is in proper bedgraph, much more legible and uncompressed (by default).
So the question is: if the command is wiggletools bin ...
in what format should the output be?
For consistency, I would argue that it should stay as it is (compressed wig format), however, I appreciate that this is not super obvious to a casual user.
We could always encode an exception for a command that say ends by a bin command, but I do not know whether this would confuse people more than anything.
Let me know what you think,
Daniel
Hi Daniel, I was thinking the same thing. I agree, and probably should stay as it is.
One example though where I am not sure what to do is, for example, doing the histogram command after bin.
wiggletools histogram blah.txt 100 fillIn genome.bed bin 20 blah.bw -> Will histogram use compressed bins or uncompressed bins?
Hello @gevro ,
The compression only kicks in right before printing out wig output (onto stdout or a file), therefore in this case it would not kick in.
However, even if there were compression, histogram should not be affected by it, as the overall shape of the profile does not change.
You can test this by comparing:
wiggletools histogram blah.txt 100 fillIn genome.bed bin 20 blah.bw
and
wiggletools histogram blah.txt 100 compress fillIn genome.bed bin 20 blah.bw
Hope this helps,
Daniel
Great, thanks so much.
Hi, Does the bin command automatically fill in missing values with 0?
In other words, do I need to do: bin ## fillIn ### blah.bw in this order? Or just bin ### blah.bw ? Or fillIn ### bin ## fillIn ### blah.bw? Or fillIn ### bin ## blah.bw?