boostorg / accumulators

An awesome library from Boost
http://boost.org/libs/accumulators
22 stars 54 forks source link

Fix heights update in weighted_extended_p_square #59

Open adimajo opened 5 months ago

adimajo commented 5 months ago

In weighted_extended_p_square.hpp, a weighted version (that is, incoming samples are given a weight) of the extended (which allows the estimation of several quantiles) p-square algorithm (an online - in the sense that it doesn't require storing all samples - quantile estimator) is implemented.

This algorithm works by updating estimates of these quantiles and additional "markers" (min, max values and all mid-points, i.e. all quantiles lying between two requested quantiles).

Unfortunately, the heights (i.e. quantile estimates) update rule does not properly take into account weights and does not differ from the unweighted case.

This implementation is correct in the unweighted case, but make the approach work poorly on situations where the weights lie far away from 1 on average (obviously when all weights are set to 1 - and one can extrapolate to an order of magnitude farther from 1 - it matches the unweighted case).

This is counter-intuitive at best, and even unsatisfactory, because it is reasonable to assume that the "weighted" equivalent of an unweighted algorithm should yield similar results when presented with similar data and the same weight for each sample.

Provided programs MWE1.{cpp,py} implement this idea:

They produce the following plot with the current implementation: MWE1_current

As can be seen, the result highly depends on the chosen weight (small to large from left to right) and are unsatisfactory for very {small,large} weights, breaking the desirable "weight-invariance" property.

Applying the proposed modifications to the heights update rule and rerunning the proposed consistency test results in a satisfactory plot: MWE1_fixed

Notes:

adimajo commented 2 months ago

@pdimov @ericniebler