Ensembl / WiggleTools

Basic operations on the space of numerical functions defined on the genome using lazy evaluators for flexibility and efficiency
Apache License 2.0
142 stars 24 forks source link

Bug in 'bin' #62

Closed gevro closed 3 years ago

gevro commented 3 years ago

Hi, I found a bug in bin. I have a bigwig whose max values are 1.0:

$wiggletools write_bg - fillIn genome.bed blah.bw | head -n 1000000 | cut -f 4 | sort -nr | head
1.000000
1.000000
1.000000
1.000000

However, when I do bin 20 scale 0.05, I am getting values > 1 (either with or without write_bg), which should be impossible:

$wiggletools bin 20 scale 0.05 fillIn genome.bed blah.bw
chr1    109840  109860  0.982538
chr1    109860  110020  1.000000
chr1    110020  110040  1.321969
chr1    110040  110060  0.959469
chr1    110060  110080  0.961057
$wiggletools write_bg - bin 20 scale 0.05 fillIn genome.bed 
chr1    109980  110000  1.000000
chr1    110000  110020  1.000000
chr1    110020  110040  1.321969
chr1    110040  110060  0.959469
chr1    110060  110080  0.961057

It is happening due to bin, because without scale, it is still happening (max value should be 20):

$ wiggletools bin 20 fillIn genome.bed blah.bw
chr1    109820  109840  19.169865
chr1    109840  109860  19.650755
chr1    109860  110020  20.000000
chr1    110020  110040  26.439389
chr1    110040  110060  19.189385
chr1    110060  110080  19.221145

This is the original area of that issue without binning:

$wiggletools fillIn genome.bed blah.bw
chr1    109845  109847  0.954545
chr1    109847  109848  0.950000
chr1    109848  110027  1.000000
chr1    110027  109848  0.000000
chr1    109848  110027  1.000000
chr1    110027  110032  0.954545
chr1    110032  110041  0.958333

I'm not sure the source of the bug, but I suspect that binning is not weighting properly the proportion of the bin that is overlapped by the bigwig values.

dzerbino commented 3 years ago

Hello @gevro ,

Actually, there appears to be a bug in fillIn, judging by the repeated sequences (chr1:109848-110027), which would allow for double counting. I'll have a look.

Cheers,

Daniel

gevro commented 3 years ago

Ah yes, good catch!

dzerbino commented 3 years ago

Hello @gevro ,

I'm struggling to reproduce the bug, could you please share the genome.bed and blah.bw files with me? Or at least the relevant region on chr1.

Cheers,

Daniel

gevro commented 3 years ago

Actually, the bug might be upstream. See below:

$wiggletools blah.bw
fixedStep chrom=chr1 start=109810 step=1
0.925000
0.966666
0.964285
0.964285
0.964285
chr1    109814  109821  0.961538
chr1    109821  109845  0.958333
chr1    109845  109847  0.954545
fixedStep chrom=chr1 start=109848 step=1
0.950000
chr1    109848  110027  1.000000
chr1    109848  110027  1.000000
chr1    110027  110032  0.954545
chr1    110032  110041  0.958333
chr1    110041  110046  0.954545

This is showing up twice: chr1 109848 110027 1.000000 chr1 109848 110027 1.000000

blah.bw was made from blah.wig using kent tools. Here is the area from blah.wig:

$cat blah.wig
fixedStep chrom=chr1 start=109810 step=1
0.925000
0.966666
0.964285
0.964285
0.964285
chr1    109814  109821  0.961538
chr1    109821  109845  0.958333
chr1    109845  109847  0.954545
fixedStep chrom=chr1 start=109848 step=1
0.950000
chr1    109848  110027  1.000000
chr1    110027  110032  0.954545
chr1    110032  110041  0.958333
chr1    110041  110046  0.954545

Origin of the above blah.wig file is from: $wiggletools scale 0.5 sum blah1.bw blah2.bw > blah.wig

Without scale 0.5 it shows:

$wiggletools sum blah1.bw blah2.bw
fixedStep chrom=chr1 start=109810 step=1
1.850000
1.933333
1.928571
1.928571
1.928571
chr1    109814  109821  1.923077
chr1    109821  109845  1.916667
chr1    109845  109847  1.909091
fixedStep chrom=chr1 start=109848 step=1
1.900000
chr1    109848  110027  2.000000
chr1    110027  110032  1.909091
chr1    110032  110041  1.916667
chr1    110041  110046  1.909091

Here are the regions from the two original bw files before sum:

$blah1.bw
fixedStep chrom=chr1 start=109803 step=1
0.909091
chr1    109803  109810  0.916667
chr1    109810  110140  1.000000
chr1    110140  110146  0.875000
chr1    110146  110171  0.857143
chr1    110171  110175  0.833333
chr1    110175  110179  0.800000
chr1    110179  110197  0.833333
chr1    110197  110208  0.800000
chr1    110208  110226  0.750000
chr1    110226  110228  0.500000
$blah2.bw
fixedStep chrom=chr1 start=109794 step=1
0.866667
0.857143
0.857143
0.857143
0.857143
0.923077
0.923077
0.923077
0.916667
chr1    109802  109809  0.928571
chr1    109809  109811  0.933333
chr1    109811  109814  0.928571
chr1    109814  109821  0.923077
chr1    109821  109845  0.916667
chr1    109845  109847  0.909091
fixedStep chrom=chr1 start=109848 step=1
0.900000
chr1    109848  110027  1.000000
chr1    110027  110032  0.909091
chr1    110032  110041  0.916667
chr1    110041  110046  0.909091

blah1.bw and blah2.bw were made from bedgraph using kent tools:

$blah1.bedgraph
chr1    109845  109846  1
chr1    109846  109847  1
chr1    109847  109848  1
chr1    109848  109849  1
chr1    109849  109850  1
chr1    109850  109851  1
chr1    109851  109852  1
chr1    109852  109853  1
chr1    109853  109854  1
$blah2.bedgraph
chr1    109845  109846  0.909091
chr1    109846  109847  0.909091
chr1    109847  109848  0.9
chr1    109848  109849  1
chr1    109849  109850  1
chr1    109850  109851  1
chr1    109851  109852  1
chr1    109852  109853  1

Does this help find the bug? Is the issue with 'sum'?

dzerbino commented 3 years ago

This looks pretty gnarly, as it could be a bug in wigToBigWig tool, in the libBigWig library, or in Wiggletools.

Could you share the file with me? Happy to do this via email if you prefer. Naturally, I will destroy my copy of the file once the bug is resolved.

Cheers,

Daniel

On Tue, 29 Dec 2020, 17:15 gevro, notifications@github.com wrote:

Actually, the bug might be upstream. See below:

$wiggletools blah.bw fixedStep chrom=chr1 start=109810 step=1 0.925000 0.966666 0.964285 0.964285 0.964285 chr1 109814 109821 0.961538 chr1 109821 109845 0.958333 chr1 109845 109847 0.954545 fixedStep chrom=chr1 start=109848 step=1 0.950000 chr1 109848 110027 1.000000 chr1 109848 110027 1.000000 chr1 110027 110032 0.954545 chr1 110032 110041 0.958333 chr1 110041 110046 0.954545

$cat blah.wig fixedStep chrom=chr1 start=109810 step=1 0.925000 0.966666 0.964285 0.964285 0.964285 chr1 109814 109821 0.961538 chr1 109821 109845 0.958333 chr1 109845 109847 0.954545 fixedStep chrom=chr1 start=109848 step=1 0.950000 chr1 109848 110027 1.000000 chr1 110027 110032 0.954545 chr1 110032 110041 0.958333 chr1 110041 110046 0.954545

Origin of the above blah.wig file is from: $wiggletools scale 0.5 sum blah1.bw blah2.bw > blah.wig

I then converted blah.wig to bw using kent tools.

Without scale 0.5 it shows:

$wiggletools sum blah1.bw blah2.bw fixedStep chrom=chr1 start=109810 step=1 1.850000 1.933333 1.928571 1.928571 1.928571 chr1 109814 109821 1.923077 chr1 109821 109845 1.916667 chr1 109845 109847 1.909091 fixedStep chrom=chr1 start=109848 step=1 1.900000 chr1 109848 110027 2.000000 chr1 110027 110032 1.909091 chr1 110032 110041 1.916667 chr1 110041 110046 1.909091

Does this help find the bug? Is the issue with 'sum'?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Ensembl/WiggleTools/issues/62#issuecomment-752168107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAIDV4JFITUV6UNJJ7A4QDSXIFEJANCNFSM4VNMJIEQ .

gevro commented 3 years ago

Sure! Thanks, I will e-mail you now.

dzerbino commented 3 years ago

Hello @gevro ,

As mentioned by email, I think I found and corrected the bug, please pull in my latest commits on master.

Let me know how it goes.

Thanks for spotting this,

Daniel

gevro commented 3 years ago

Installation via brew is not working:

1) First I uninstalled: brew uninstall wiggletools.

2)Then install from --HEAD

$ brew install wiggletools --HEAD
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 2 taps (homebrew/core and homebrew/cask).
==> New Formulae
pkger
==> Updated Formulae
Updated 104 formulae.
==> New Casks
operator                                                                      silicon
==> Updated Casks
Updated 49 casks.

==> Installing wiggletools from brewsci/bio
==> Cloning https://github.com/Ensembl/WiggleTools.git
Updating /Users/evrong01/Library/Caches/Homebrew/wiggletools--git
From https://github.com/Ensembl/WiggleTools
   5b4f2a4..94e11c4  master     -> origin/master
==> Checking out branch master
Already on 'master'
Your branch is behind 'origin/master' by 9 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
HEAD is now at 94e11c4 Implemented compression keyword
==> make
Last 15 lines from /Users/evrong01/Library/Logs/Homebrew/wiggletools/01.make:
                apply = newWiggleIterator(data, &LooseBufferedWiggleIteratorPop, &BufferedWiggleIteratorSeek, data->default_value);
                        ~~~~~~~~~~~~~~~~~                                                                                        ^
./wiggleIterator.h:37:1: note: 'newWiggleIterator' declared here
WiggleIterator * newWiggleIterator(void * data, void (*pop)(WiggleIterator *), void (*seek)(WiggleIterator *, const char *, int, int), double default_value, bool overlapping);
^
apply.c:169:77: error: too few arguments to function call, expected 5, have 4
        return newWiggleIterator(data, &FillInUnaryPop, NULL, source->default_value);
               ~~~~~~~~~~~~~~~~~                                                   ^
./wiggleIterator.h:37:1: note: 'newWiggleIterator' declared here
WiggleIterator * newWiggleIterator(void * data, void (*pop)(WiggleIterator *), void (*seek)(WiggleIterator *, const char *, int, int), double default_value, bool overlapping);
^
3 errors generated.
make[1]: *** [apply.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Wiggletools] Error 2

If reporting this issue please do so at (not Homebrew/brew or Homebrew/core):
  https://github.com/brewsci/homebrew-bio/issues

Please create pull requests instead of asking for help on Homebrew's GitHub,
Twitter or any other official channels.
dzerbino commented 3 years ago

D'oh! I had forgotten to commit the changes in that file!

Now fixed, please try again.

gevro commented 3 years ago

Works! I am truly appreciative for all the quick fixes and improvements!