friguzzi / cplint

cplint is a suite of programs for reasoning with probabilistic logic programs
Other
67 stars 14 forks source link

Pascal distribution #49

Closed hakank closed 2 years ago

hakank commented 2 years ago

I was testing thebitcoin_attack.pl example and noticed that the mean of pascal(X,10.0.3) and poisson(X,10*0.3/0.7) are about the same: about 4.3 (10*0.3/0.7 ~ 4.2857). For Poisson this is correct, but I wonder about the result of the Pascal distribution.

According to Mathematica's PascalDistribution (https://reference.wolfram.com/language/ref/PascalDistribution.html) the mean should be

Mean@PascalDistribution[10,0.3] 
-> 33.33

(n/p = 10/0.3 = 33.33). Also see the Wolfram One notebook for this: https://www.wolframcloud.com/obj/hakank/Published/pascal_dist_10_0.3

Here's a test of cplint's pascal distribution:

:- use_module(library(mcintyre)).

:- mc.

:- begin_lpad.

d(X) : pascal(X,10,0.3).

:- end_lpad.

/** <examples> 
  ?- mc_expectation(d(X),1000,X,E).
*/

Also at https://cplint.eu/p/Pascal%20distribution%20test.pl

The result is about 4.2.

Have I missed something regarding the Pascal distribution?

damianoazzolini commented 2 years ago

Hi, sorry for the late reply. Thanks for reporting it, there was indeed a bug in the code, now it should be fixed. However, we follow the implementation discussed in the python library scipy, see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.nbinom.html With the library we get:

>>> from scipy.stats import nbinom
>>> n, p = 10, 0.3
>>> mean, *_ = nbinom.stats(n, p, moments='mvsk')
>>> mean
array(23.33333333)

which is now very close to what we get using mc_expectation/4:

?- mc_expectation(d(X),1000,X,E).
E = 22.812.

?- mc_expectation(d(X),1000,X,E).
E = 23.5.

?- mc_expectation(d(X),1000,X,E).
E = 23.398.

So there is maybe a different meaning of the parameters in the Mathematica's PascalDistribution. Here, in pascal(X,N,P), N is the number of successes and P is the probability of a single success. Let me know if you spot other problems, thanks.

hakank commented 2 years ago

Thanks for the fix.

Ah, it's the Negative Binomial distribution.

Wikipedia (https://en.wikipedia.org/wiki/Negative_binomial_distribution#Waiting_time_in_a_Bernoulli_process ) states the difference between the Pascal and Negative Binomial distributions: """ For the special case where r is an integer, the Negative Binomial distribution is known as the Pascal distribution. """

Since your implementation supports non integer values of R, perhaps it should be called nbinom instead, or make some comment on this in the documentation?

Also, note that Mathematica give different results for NegativeBinomialDistriibution and PascalDistribution with the same parameters:

Mean@PascalDistribution[10, 0.3]
-> 33.3333
Mean@NegativeBinomialDistribution[10, 0.3]
-> 23.3333

(Sorry for being picky about this.)

damianoazzolini commented 2 years ago

You are right, i've renamed it as negative_binomial. Let me know if something else is not clear.

hakank commented 2 years ago

That's great!