Open SamuelBrand1 opened 1 month ago
Hi @SamuelBrand1, thanks for the issue!
Could you provide a MWE that shows what you're trying to do with Turing that causes this crash? My initial impression is that this seems like an upstream issue with Distributions.
Hey @penelopeysm
Its actually quite hard to construct a MWE here from a clean script, for example,
using Turing, Distributions
@model function rw_model(rw₀, n)
ϵ_t ~ filldist(Normal(), n)
rw = rw₀ .+ cumsum(ϵ_t)
return rw
end
@model function data_model(ys, rw)
for i in eachindex(rw)
ys[i] ~ Poisson(exp(rw[i]))
end
return ys
end
@model function pois_rw(ys, rw₀)
n = length(ys)
@submodel rw = rw_model(rw₀, n)
@submodel gen_ys = data_model(ys, rw)
return rw, gen_ys
end
generative_mdl = pois_rw(fill(missing,10), 3.)
ys = generative_mdl()[2] .|> Int
inference_mdl = pois_rw(ys, 48.) #deliberately bad choice for rw_0
chn = sample(inference_mdl, NUTS(), 1000)
Works ok with package versions:
[31c24e10] Distributions v0.25.110 [fce5fe82] Turing v0.33.3
And this is roughly analogous to what @seabbs and myself are doing atm.
This seems to suggest that I should move down towards Distributions
, as well as investigating why the large mean Poisson problem is hitting our modelling but not in this minimal script.
Agreeing this does seem to be a Distributions
issue so flagging it there makes sense. F2F @SamuelBrand1 and I have discussed that the SciMl ecosystem has had to develop their own faster Poisson (presumably due to issues changing it in Distributions). As this seems like an issue that would be common for users Turing users weighing in on the Distributions issue might be useful (as the rand call being buried in the inference call is quite confusing/hard to resolve so reducing the chance of edge case issues with rand seems like a good thing).
F2F @SamuelBrand1 and I have discussed that the SciMl ecosystem has had to develop their own faster Poisson (presumably due to issues changing it in Distributions).
That's basically a myth, and I'm not really sure where it comes from. The sampler in SciML and Distributions are actually basically identical and already quite a few years ago I noticed that SciML is not faster: https://github.com/SciML/PoissonRandom.jl/issues/6
Issue in Distributions for the problem here: https://github.com/JuliaStats/Distributions.jl/issues/821
We should hopefully be able to fix it when overhauling rand
etc. more generally. But my gut feeling is that it's somewhat suspicious if such large values show up in your model anyway - I guess it's an indication that you might want to rescale your quantities?
Hi everyone,
Base Julia can't convert large floats into
Int
type usingfloor
orround
etc. This creates a problem for sampling from Poisson's with a large mean because this is used in fast polyalgo for Poisson sampling.e.g.
Now this would not be problematic, because the
logpdf
calls are not affected but for some reason arand
call comes into usingTuring
at inference time (not sure why...).Is there any chance of a safer version of the Poisson/Negative Binomial that can detect if
round(BigInt, ...
should be used?