Open paulharris opened 1 month ago
Confirmed. Mathematica outputs 2718 for command Quantile[PoissonDistribution[2719.13], 0.49]
and so on
In all failing cases we're rounding to floor in inv_discrete_quantile
: https://github.com/boostorg/math/blob/develop/include/boost/math/distributions/detail/inv_discrete_quantile.hpp#L427, since the default policy is:
#define BOOST_MATH_DISCRETE_QUANTILE_POLICY integer_round_outwards
if we change the policy in poisson for p < 0.5 to integer_round_upwards
the results match, but I'm not sure that's a correct fix. I believe @jzmaddock is on holiday right now, but I'm sure he will have a better answer when he's back.
Edit: branch "1203" has a prebuilt test set
I'm (somewhat) back from holiday, but I don't have a good answer because there really isn't one, and the correct behaviour depends a lot on your use case. To summarise, we are inverting a real-valued continuous function, but are expecting a discrete result, so the options are:
1) round up. 2) round down. 3) round to nearest. 4) don't round at all, and let the user decide. 5) The slightly odd (but hopefully makes sense when you think about it) "round outwards" which is our default.
The choices and rationale is explained here: https://beta.boost.org/doc/libs/1_82_0/libs/math/doc/html/math_toolkit/pol_tutorial/understand_dis_quant.html
The problem was observed when we plotted the results of uniformly sampling p across 0-1, computing the quantile, and then plotting on a chart. With the current behaviour, you get a kink in the line at the median. If you plot the effective histogram of the distribution of quantiles from this sample, you get this much more obvious dip at the median.
I understand there is a choice, but it is inconsistent with other math packages, and can have other knock-on affects when users expect consistent results across the number space.
My thought model for the quantile of a non-continuous distribution is: if we draw many, many observations, what would be the empirical quantile? I would use that number as the population quantile.
The result turns out to agree with the quantile function defined on Wikipedia:
It is often standard to choose the lowest value, which can equivalently be written as $Q(p) = \inf \{ x \in \mathbb{R} : p \le F(x) \}$.
This might correspind to integer_round_up
of the rounding policy, though mathematically speaking I think there’s no “rounding” taking place here because the CDF is discontinuous. (It’s different from the situation where the CDF is continuous but its value is not representable exactly by a floating point number.)
(Btw, the second graph (CDF zoom-in) on this page is a little suspicious — the CDF jumps at n+0.5
in that graph, but it should jump at n
if I’m not mistaken.)
Hi,
We spotted an apparent bias in the inverse cumulative poisson distribution function, looking deeper, we found boost is generating different results when compared with scipy and statslib (https://github.com/kthohr/stats)
The value of mean is not important, but for p < 0.5, the result is offset by -1
ie:
In Scipy:
Code:
Output: