apache / datasketches-java

A software library of stochastic streaming algorithms, a.k.a. sketches.
https://datasketches.apache.org
Apache License 2.0
893 stars 209 forks source link

DirectDoublesSketch has error (maybe in propagate-carry?) #134

Closed jmalkin closed 7 years ago

jmalkin commented 7 years ago

Running the following code (yes, allocating way more memory than needed):

final int k = 16;
final int n = k * 2 + 7;
final ByteBuffer bb = ByteBuffer.allocateDirect(100 + n << 3);
final Memory mem = AllocMemory.wrap(bb);

final DoublesSketchBuilder dsb = DoublesSketch.builder();
dsb.initMemory(mem);
final DoublesSketch ds = dsb.build(k);

for (int i = 0; i < n; ++i) {
  ds.update(i);
}

System.out.println(ds.toString(true, true));

The base buffer seems ok, but data In the first level has unexpected 0.0s. If I use n = k * 4 + m, I was getting 0.0 values in the middle of the level.

Quantiles DirectDoublesSketch DATA DETAIL:

 BaseBuffer   :       32.0      33.0      34.0      35.0      36.0      37.0      38.0
 Valid | Level
   T       0:        0.0       0.0       0.0       2.0       4.0       6.0       8.0      10.0      12.0      14.0      16.0      18.0      20.0      22.0      24.0      26.0
### END DATA DETAIL
jmalkin commented 7 years ago

Running with larger numbers of samples, the issue seems to be with the top level.

jmalkin commented 7 years ago

This commit (and the subsequent unit test) seem to have resolved the issue: https://github.com/DataSketches/sketches-core/commit/c44c7f4ec3213e4961ad0eaf4ab080b7798fb35d