apache / datasketches-java

A software library of stochastic streaming algorithms, a.k.a. sketches.
https://datasketches.apache.org
Apache License 2.0
875 stars 207 forks source link

Implemented Inclusion of min/max for Floats, Doubles and Items Sorted Views. #547

Closed leerho closed 2 months ago

leerho commented 2 months ago

Also fixed a problem I discovered in the deserialization for Boolean quantiles, but it would also have shown up with other types. The data was ok, but the max value would have been incorrect.

Also fixed some problems with some of the tests.

Along the way I also replaced "back-slash-n" with platform independent "LS" in the files I touched.

Also moved some functions like getCDF(..) and getPMF(...) into parent interfaces as default functions. Now the three parent interfaces for Doubles SV, Floats SV and Items SV are all code-parallel.

leerho commented 2 months ago

Changes by file:

/kll/

KllDirectCompactItemsSketch

KllDoublesSketch

KllFloatsSketch

KllHeapDoublesSketch

KllItemsSketch

KllMemoryValidate

/quantiles/

DoublesSketch

ItemsUtil

/quantilescommon/

DoublesSketchSortedView

FloatsSketchSortedView

GenericSortedView

IncludeMinMax

ItemsSketchSortedView

/req/

ReqSketch

/test/...

/kll/

KllDirectCompactItemsSketch

KllMiscDoublesTest

KllMiscFloatsTest

KllMiscItemsTest

/quantilescommon/

IncludeMinMaxTest

leerho commented 2 months ago

We changed the behavior of the sorted view to ensure the min and max values are always represented, which, we believe, will provide a more practical information to users of these distributions without distorting the resulting data. -- @jmalkin