63 added an implementation of a negative binomial likelihood. This prescribed the parametrisation in terms of (number of successes r, failure probability p), where the number of successes is a (scalar, equal for all observations) likelihood parameter and the failure probability is modelled with a latent GP f, squashed through the logistic link function by default, so p = logistic(f). This parametrisation appears to be common e.g. in intermittent count time series modelling. However, in other areas different parametrisations are more common, e.g. directly using the latent GP to parametrise the mean of the observations, and a scalar parameter to measure overdispersion. Moreover, there are multiple variants of the latter parametrisation (Type I and II being the most common). This is somewhat problematic as, unlike with the "bare-bones" Distributions object, a user cannot re-use the `NegativeBinomialLikelihood` object for one of the other parametrisations. I'm not quite sure what we should do here, just noting that it seems potentially rather confusing to give a specific name to one meaning of something that actually has multiple meanings in different areas (with no easy way of converting)... :thinking:

Parametrisations:

current NegativeBinomialLikelihood:
- number of successes r = likelihood parameter, failure probability p = logistic(f)
- mean = exp(f) / r
- var = mean p = mean^2 r / (1 + mean * r) (?)
alternative parametrisations:
- Type I / Quasi-Poisson:
- latent GP f, likelihood parameter \phi
- mean = invlink(f) e.g. exp(f)
- var = mean * \phi
- Type II:
- latent GP f, likelihood parameter k
- mean = invlink(f) e.g. exp(f)
- var = mean + mean^2 * k

The current implementation of NegativeBinomialLikelihood sets p = logistic(f) = 1/(1+exp(-f)) and keeps r as a free parameter. If my calculations are correct, this means that the conditional mean of the observation is m = r exp(-f) and the conditional variance v = r exp(-f)(1+exp(-f)). The Type II parametrization sets m_I = exp(f') and v_I = m_I + k* m_I^2. This is equivalent to the current implementation for f' = ln(r) - f and k = 1/r, under the condition that we parametrize the mean of the GP such that the shift can be absorbed by the latent function (the sign flip of f is not identifiable anyway).

In contrast, type I seems to be truly different from the current implementation.

Thus, I would propose to keep NegativeBinomialLikelihood and to add a wrapper around that called NegativeBinomialLikelihoodTypeI and implent NegativeBinomialLikelihoodTypeII independent from the previous two.

JuliaGaussianProcesses / GPLikelihoods.jl

`NegativeBinomial` likelihood enforces single parametrization #69