Open emfeltham opened 3 years ago
Hi Eric-
I suspect one reason this model has not been added is that the logpdf
for the zero inflated Poisson has not been implemented. Here my attempt at a numerically stable implementation:
using Distributions, Turing, StatsFuns
import Distributions: logpdf
struct ZIPoisson{T1,T2} <: DiscreteUnivariateDistribution
logλ::T1
w::T2
end
function logpdf(d::ZIPoisson, y::Int)
LL = 0.0
if y == 0
LLs = zeros(typeof(d.logλ), 2)
LLs[1] = log(d.w)
LLs[2] = log(1 - d.w) - exp(d.logλ)
LL = logsumexp(LLs)
else
LL = log(1 - d.w) + logpdf(LogPoisson(d.logλ), y)
end
return LL
end
This can be added to your script and it should work. Note that ZIPoisson
uses log of lambda rather than lambda. That should get you started.
Rob, do you think ZIPoisson
should be included in Distributions or Turing?
Hi Chris, thanks for stepping in. Let me play around a bit and compare the results between your model and below Stan language model. I am clearly not the right person to answer your question but definitely like the idea to add this example to the collection of models (both Stan and Turing) as your example shows how "relatively easy" it is to extend the set of available distributions. Does that make sense? ( @torkar @trappmartin )
Hi Eric,
It has never really been my intention to have all models in the ([Stan|Turin|DynamicHMC]Models repositories. But it certainly helps if folks indicate which models are difficult. Thank you.
data{
int y[365];
}
parameters{
real ap;
real al;
}
model{
real p;
real lambda;
al ~ normal( 1 , 0.5 );
ap ~ normal( -1.5 , 1 );
lambda = al;
lambda = exp(lambda);
p = ap;
p = inv_logit(p);
for ( i in 1:365 ) {
if ( y[i]==0 )
target += log_mix( p , 0 , poisson_lpmf(0|lambda) );
if ( y[i] > 0 )
target += log1m( p ) + poisson_lpmf(y[i] | lambda );
}
}
Nice to see that people want to add more models to the rethinking repository and as you suspected Eric, the reason for not having these models was that I didn't get that far due to other obligations. My ultimate goal when starting the translations was to have all models from McElreath's Rethinking translated, and now the 2nd edition has been released :)
Rob, I think that ZI models are quite common (I've used them in my own research so we have some anecdotal evidence at least ;)
From the top of my head, Richard assumes, in the 2nd edition, the following data generative processes,
polr()
in R's MASS
package)On top of that, it would be interesting to also have adjacent-category and sequential likelihoods (https://journals.sagepub.com/doi/pdf/10.1177/2515245918823199), other flavors of ZI (Beta, Neg-Bin, Bin), 0/1-Beta, and shifted-lognormal, skew-normal, student, weibull.
I have to confess that I haven't checked out the support for the above distributions in Distributions.jl
lately... In short, I would like to pick one of the above distributions and simply start using it, just like when I use rethinking
, brms
, rstanarm
, et al.
I think many people would not see it as "relatively easy" to implement things from scratch, after all, if everyone did that we would have variants with bugs sooner or later, so perhaps better offer robust alternatives? Even in the relatively simple example that Chris provides, given that I'm not fluent in Turing, I would have to rely on @trappmartin to check the code and optimize if needed.
Would there be the possibility to identify those distributions that are commonly used in probabilistic modelling settings but not available at the moment? I and others from the Turing team are not really doing much modelling and therefore we don't really know what is missing from a user perspective. But I think it would be great if we could extend the set of distributions supported by default to a more reasonable collection.
As far as I can tell, most of the distributions mentioned so far appear to be implemented in Distributions or Turing. Although Turing has an ordered-logistic distribution, I'm not sure whether it covers all of the adjacent category models @torkar referenced. ZI-Poisson is the missing distribution among the ones mentioned so far. Aside from that, the only other distribution I can think of at the moment is the Wiener process, which is used in many fields.
Here is a working example for future reference:
using Distributions, Turing, StatsFuns, Random
import Distributions: logpdf, rand
struct ZIPoisson{T1,T2} <: DiscreteUnivariateDistribution
logλ::T1
w::T2
end
function logpdf(d::ZIPoisson, y::Int)
LL = 0.0
if y == 0
LLs = zeros(typeof(d.logλ), 2)
LLs[1] = log(d.w)
LLs[2] = log(1 - d.w) - exp(d.logλ)
LL = logsumexp(LLs)
else
LL = log(1 - d.w) + logpdf(LogPoisson(d.logλ), y)
end
return LL
end
function rand(d::ZIPoisson)
return rand() <= d.w ? 0 : rand(Poisson(exp(d.logλ)))
end
rand(d::ZIPoisson, N::Int) = map(_->rand(d), 1:N)
inv_logit(x) = 1/(1 + exp(-x))
@model model(data) = begin
a1 ~ Normal(1, .5)
logλ ~ Normal(-1.5, 1)
w = inv_logit(a1)
data .~ ZIPoisson(logλ, w)
end
Random.seed!(74591)
data = rand(ZIPoisson(-1.5, .70), 1000)
n_samples = 3000
n_adapt = 1500
config = NUTS(n_adapt, .65)
n_chains = 4
chain = sample(model(data), config, MCMCThreads(), n_samples, n_chains, progress=true)
Thanks Chris, I'll add it for now as m12.3bt.jl to StatisticalRethinkingTuring.
Given Richard (T)'s very useful list I think the answer to your original question could be a separate repo in JuliaStats, e.g. AdditionalDistributions.jl, which can hold new distributions and/or flavors until shown robust and of general interest. Maybe some of the distributions in distributions.jl in Turing could also go there to test those in more settings.
I haven't tried it yet , but wonder if the ZIPoisson() definition as you provided also is sufficient for Turing's MAP(), Prior() and predict()?
Hello: I was revisiting this recently, and noticed that the model does not seem to replicate the book result. I think that the priors were switched around. However, making a minor change to Chris' code corrects this (RCall with the appropriate seed (seed 365, on page 390) to generate the same data).
# this replicates the book result
@model ppl12_3bt(x) = begin
α_λ ~ Normal(1, 0.5) # Normal(-1.5, 1)
α_p ~ Normal(-1.5, 1)
logλ = α_λ
logitp = α_p
p = inv_logit(logitp)
for i ∈ 1:length(y)
y[i] ~ ZIPoisson(logλ, p)
end
end
Relatedly, I've been writing up the functions to make ZIPoisson more complete -- e.g. adding a cdf, quantile, function. ZIPoisson should also work with Turing's predict() function now. Do you think that this should be submitted to Distributions.jl, with appropriate credit given to Chris?
Eric
It would be great to have distributions like ZIPoisson available, even if they might initially not be as robust and complete as the other distributions in Distributions.jl. Maybe my preference would still be to add it to Distributions.jl, but if that turns out to be too complicated/time consuming, create a new repo AdditionalDistributions.jl
in either Turing or StatisticalRethinkingJulia as a holding place until implementations of such new distributions have been tested and are considered mature enough.
Hello,
I realize that this package is under development. I was looking through, and note that the section 12 models file includes the zero-inflated models from the book, e.g.
## R code 12.9 m12.3 <- ulam( alist( y ~ dzipois( p , lambda ), logit(p) <- ap, log(lambda) <- al, ap ~ dnorm( -1.5 , 1 ), al ~ dnorm( 1 , 0.5 ) ) , data=list(y=y) , chains=4 )
However, these do not seem to appear in the Julia / Turing.jl files. I was wondering whether there were resources for running zero-inflated models in Turing, or whether you had managed to implement them.
Best, Eric