boostorg / accumulators

An awesome library from Boost
http://boost.org/libs/accumulators
22 stars 54 forks source link

Quadratic quantile interpolator does not guarantee continuity #63

Open adimajo opened 5 months ago

adimajo commented 5 months ago

In [weighted_][extended_]p_square.hpp, the p-square algorithm (an online - in the sense that it doesn't require storing all samples - quantile estimator) is implemented. Additionally:

The extended version also introduces the ability to use interpolation:

Unfortunately, the choice of the quadratic interpolator polynomial introduces "jumps" in the estimated quantile function.

In extended_p_square_quantile.hpp (currently line 154), if ( (dist == 1 || *iter_probs - this->probability <= this->probability - *(iter_probs - 1) ) && dist != this->probabilities.size() - 1 ) will switch to a different polynomial around the mid-points of requested quantiles (excluding first and last mid-points).

This creates situations where $\exists \; 0 < i < 1, \exists \; \eta, \; \forall \; \epsilon, \; \hat{q}(i + \epsilon) - \hat{q}(i) > \eta$. In other words, $\hat{q}(i + \epsilon)$ will not converge to $\hat{q}(i)$ when $\epsilon$ goes to 0, and there is a discontinuity or "jump" in the quantile function.

To illustrate this claim, the programs MWE(3_)4.{cpp,py} do the following:

MWE4

Note also that this discontinuity is "on the wrong side", i.e. similar to issue #62, we get that $\hat{q}(0.874999) > \hat{q}(0.875)$.

Since this makes no mathematical sense, I would either issue a strong warning at instantiation or deprecate this interpolator (see also issue #62). If an additional interpolator (w.r.t. the linear one) is needed, I would suggest looking into integrating splines which are continuous by design (see e.g. https://github.com/ttk592/spline/).

Notes: