LeoEgidi / footBayes

An R package for many football models
38 stars 7 forks source link

Error in "bipois_lpmf"? #2

Open tommyod opened 1 year ago

tommyod commented 1 year ago

Hi!

Looking at bipois_lpmf, I cannot get it to match the Wikipedia equation for bivariate Poisson:

image

An alternative source is page 5 in this slide deck.

Comment 1. For instance, assume miny = 0. Now look at the term exp(-lambda1 - lambda2 - lambda3) in the equation, before the summation. In the code it says

ss = poisson_lpmf(r[1] | mu1) + poisson_lpmf(r[2] | mu2) - exp(mu3);

Should it not be -mu3 instead of - exp(mu3) since we're in log-space?

Comment 2. Also, in the sum I don't understand where the terms in

log_s = log_s + log(r[1] - k + 1) + mus+ log(r[2] - k + 1)- log(k);

come from. Seems wrong. When I take logs of the terms in the sum I get (see full code below):

lchoose(r[1], k) + lchoose(r[2], k) + lgamma(k + 1) + k * (log(mu3) - log(mu1) - log(mu2));

Any comments on the above?


Here is my attempt at an implementation:

real bipois_lpmf(int[] r ,real mu1, real mu2, real correlation_coeff) {
    // r = argument to evaluate on
    // https://en.wikipedia.org/wiki/Poisson_distribution#Bivariate_Poisson_distribution
    // http://www2.stat-athens.aueb.gr/~karlis/multivariate%20Poisson%20models.pdf

    real log_base_factor;
    real log_sum_factor;
    real log_factor;
    int  miny;

    // Logarithm of the base factor in the multivariate Poisson distribution
    // ie the term before the summation in the equation here
    // https://en.wikipedia.org/wiki/Poisson_distribution#Bivariate_Poisson_distribution
    log_base_factor = poisson_lpmf(r[1] | mu1) + poisson_lpmf(r[2] | mu2) -
    correlation_coeff;

    // Number of terms in the summation
    miny = min(r[1], r[2]);

    // This initial conditions works because the first term in the sum
    // when k = 0 evaluates to 1, so 
    // log_sum_exp(0, log(term)) = log_sum ( 1 + term)
    log_sum_factor = 0;
    if(miny > 0) {
        for(k in 1:miny) {

            // The term is: choose(r_1, k) * choose(r_2, k) * k! * (corr / (mu1 * mu2))^k
            // Here we compute the log of that term
            log_factor = lchoose(r[1], k) + lchoose(r[2], k) + 
                         lgamma(k + 1) + 
                         k * (log(correlation_coeff) - log(mu1) - log(mu2));

            // The log-sum-exp function is associative, since
            // log_sum_exp(log_sum_exp(a, b), c) = log_sum_exp(a, b, c)
            log_sum_factor = log_sum_exp(log_sum_factor, log_factor);
        }
    }
    return log_base_factor + log_sum_factor;
}

Any comments on this? Am I onto something, or am I really dense or missing something obvious?

Thanks in advance for your inputs.