Open sethaxen opened 2 years ago
cc @devmotion
the conditioned model doesn't need to allocate y at all.
The model explicitly states though that y
should be allocated: y = similar(...)
. Turing does not change your model implementation, if you tell it to allocate an array it will allocate one every time you run the model.
Avoiding allocation when unnecessary is one thing, and it can be achieved as follows:
julia> using Turing
julia> @model function regress2(x)
α ~ Normal()
β ~ filldist(Normal(), size(x, 2))
σ ~ truncated(Normal(); lower=0)
μ = muladd(x, β, α)
if DynamicPPL.contextual_isassumption(__context__, @varname(y))
@info "allocating"
y = similar(vec(μ), Base.promote_eltype(μ, σ))
else
y = DynamicPPL.getvalue_nested(__context__, @varname(y))
end
for i in eachindex(μ, y)
y[i] ~ Normal(μ[i], σ)
end
return y
end
regress2 (generic function with 2 methods)
julia> x = 0:0.1:π;
julia> y = sin.(x) .+ 2 .+ randn.() .* 0.1;
julia> model = regress2(Matrix(reshape(x, :, 1)));
julia> model() == y
[ Info: allocating
false
julia> (model | (y=y, ))() == y
true
We can of course make this easier for the user by introducing some macro(s).
But regarding Zygote compatibility, this won't help since we'll still end up calling setindex!
on y
to make sure that the variable present in the body of the evaluation-method actually holds the correct value (at the ~
statement we don't know if this is the case).
Regarding computation of pointwise log-likelihoods, one alternative is:
julia> @model function regress2(x)
α ~ Normal()
β ~ filldist(Normal(), size(x, 2))
σ ~ truncated(Normal(); lower=0)
μ = muladd(x, β, α)
if __context__ isa DynamicPPL.PointwiseLikelihoodContext
@info "allocating"
y = similar(vec(μ), Base.promote_eltype(μ, σ))
for i in eachindex(μ, y)
y[i] ~ Normal(μ[i], σ)
end
else
# Compatible with Zygote.jl
y ~ arraydist(Normal.(μ, σ))
end
return y
end
regress2 (generic function with 2 methods)
julia> vi = DynamicPPL.VarInfo(model | (y=y, ));
julia> pointwise_loglikelihoods(model | (y=y, ), vi);
[ Info: allocating
julia> (model | (y=y, ))();
Again, somewhat hacky but can be made easier for the user.
It's nice to see what already can be done - but I strongly feel like nobody should have to code models in that way. Users should not use any of this internal functionality which can break at any time.
FYI the issue here is closely related to https://github.com/TuringLang/DynamicPPL.jl/issues/395. The problem/question of manually specifying or checking if something is an assumption or not reminds me also of https://github.com/TuringLang/DynamicPPL.jl/issues/390 but that is a bit more far fetched.
It's nice to see what already can be done - but I strongly feel like nobody should have to code models in that way. Users should not use any of this internal functionality which can break at any time.
100% agree with this of course:) But the problem is that I don't see how we can actually support this otherwise given how we also have this overall aim of not making too many changes to the body of the model.
I do however think we can make this sufficiently convenient for the user so that it actually becomes usable. For example, once you start caring heavily about performance, you often need to add these conditional statements, e.g. when you sample you want to use @addlogprob!
to make things super-performant (usually related to usage of Zygote) but then when you're not running inference requiring AD you instead want a for-loop. I've unfortunately had to use this sorts of pattern in several research projects over the past 6 months to achieve both desirable performance and do the analysis we want post-inference.
Again, I agree that ideally this would all be done under the hood, but I don't see how we can achieve this with the approach we've taken in DPPL.
Maybe we should add some macros such as:
macro is_pointwise_loglikelihood()
return :($(esc(:__context__)) isa DynamicPPL.PointwiseLikelihoodContext))
end
macro isassumption(variable)
return esc(DynamicPPL.isassumption(variable))
end
Then we can do:
julia> @model demo() = @isassumption(x) ? x ~ Normal() : y ~ Normal()
demo (generic function with 2 methods)
julia> keys(VarInfo(demo()))
1-element Vector{AbstractPPL.VarName{:x, Setfield.IdentityLens}}:
x
julia> keys(VarInfo(condition(demo(), x=1)))
1-element Vector{AbstractPPL.VarName{:y, Setfield.IdentityLens}}:
y
I swear we've spoken about this before @devmotion; do you remember why we didn't want to do this?
One immediate aspect is that it's going to be a bit annoying to have all these macros, but, as I said before, I don't see a way around this.
EDIT: And above examples becomes
julia> @model function regress2(x)
α ~ Normal()
β ~ filldist(Normal(), size(x, 2))
σ ~ truncated(Normal(); lower=0)
μ = muladd(x, β, α)
if @is_pointwise_loglikelihood
@info "allocating"
y = similar(vec(μ), Base.promote_eltype(μ, σ))
for i in eachindex(μ, y)
y[i] ~ Normal(μ[i], σ)
end
else
# Compatible with Zygote.jl
y ~ arraydist(Normal.(μ, σ))
end
return y
end
regress2 (generic function with 2 methods)
julia> vi = DynamicPPL.VarInfo(model | (y=y, ));
julia> pointwise_loglikelihoods(model | (y=y, ), vi);
[ Info: allocating
julia> (model | (y=y, ))();
EDIT 2: Taking it a step further (though it's a question of how many of these we want to add)
macro isobservation(variable)
tmp = gensym(:isobservation)
return esc(:($tmp = $(DynamicPPL.isassumption(variable)); !$tmp))
end
macro value(variable)
vn = DynamicPPL.varname(variable)
return esc(:($(DynamicPPL.getvalue_nested)(__context__, $vn)))
end
julia> @model demo() = @isobservation(x) ? (y = y ~ Normal(@value(x), 1), ) : (x = x ~ Normal(), )
demo (generic function with 2 methods)
julia> demo()()
(x = -1.4439334859195563,)
julia> condition(demo(), x=100)()
(y = 100.05190897117562,)
This would be awesome to have! I'm saving all your snippets for future use:) Thank you
TL/DR: It would be very useful if a models that allocate an array of parameters and then sample them could be made compatible with AD engines that don't support mutation. Barring that, such models should at least be made to work fine with these ADs if one conditions that model on that array of parameters.
Suppose we want a simple linear regression model from which we can compute pointwise log-likelihoods. This is how we can write the model including data as input:
But if we want to instead write the model as a joint distribution of $y$ and parameters $\theta=(\alpha, \beta, \sigma)$ and condition on data later, then we would write a model that allocates
y
internally:Now we can sample from this joint distribution, and we can condition on $y$ to draw samples from the posterior of $\theta | y$, so long as we don't need gradients or use an AD that supports mutation. With an AD like Zygote, this will fail in both cases, which is surprising because the conditioned model doesn't need to allocate
y
at all. The usually recommended solution of usingarraydist
orfilldist
here is not possible if one wants pointwise log-likelihoods eventually.Here's a minimal failing example: