Supporting mutating ADs in models that fill arrays of parameters

sethaxen commented 2 years ago

TL/DR: It would be very useful if a models that allocate an array of parameters and then sample them could be made compatible with AD engines that don't support mutation. Barring that, such models should at least be made to work fine with these ADs if one conditions that model on that array of parameters.

Suppose we want a simple linear regression model from which we can compute pointwise log-likelihoods. This is how we can write the model including data as input:

using Turing

@model function regress(x, y)
    α ~ Normal()
    β ~ filldist(Normal(), size(x, 2))
    σ ~ truncated(Normal(); lower=0)
    μ = muladd(x, β, α)
    for i in eachindex(μ, y)
        y[i] ~ Normal(μ[i], σ)
    end
end

But if we want to instead write the model as a joint distribution of $y$ and parameters $\theta=(\alpha, \beta, \sigma)$ and condition on data later, then we would write a model that allocates y internally:

@model function regress2(x)
    α ~ Normal()
    β ~ filldist(Normal(), size(x, 2))
    σ ~ truncated(Normal(); lower=0)
    μ = muladd(x, β, α)
    y = similar(vec(μ), Base.promote_eltype(μ, σ))
    for i in eachindex(μ, y)
        y[i] ~ Normal(μ[i], σ)
    end
end

Now we can sample from this joint distribution, and we can condition on $y$ to draw samples from the posterior of $\theta | y$, so long as we don't need gradients or use an AD that supports mutation. With an AD like Zygote, this will fail in both cases, which is surprising because the conditioned model doesn't need to allocate y at all. The usually recommended solution of using arraydist or filldist here is not possible if one wants pointwise log-likelihoods eventually.

Here's a minimal failing example:

julia> using Turing, Zygote

julia> Turing.setadbackend(:zygote);

julia> @model function regress2(x)
           α ~ Normal()
           β ~ filldist(Normal(), size(x, 2))
           σ ~ truncated(Normal(); lower=0)
           μ = muladd(x, β, α)
           y = similar(vec(μ), Base.promote_eltype(μ, σ))
           for i in eachindex(μ, y)
               y[i] ~ Normal(μ[i], σ)
           end
       end;

julia> x = 0:0.1:π;

julia> y = sin.(x) .+ 2 .+ randn.() .* 0.1;

julia> model = regress2(Matrix(reshape(x, :, 1)));

julia> model_cond = model | (; y=y);

julia> sample(model, NUTS(), 500);  # unsurprising failure; model allocates y
ERROR: Mutating arrays is not supported -- called setindex!(::Vector{Float64}, _...)
...

julia> sample(model_cond, NUTS(), 500);  # surprising failure; y is allocated but unused since given
ERROR: Mutating arrays is not supported -- called setindex!(::Vector{Float64}, _...)
...

sethaxen commented 2 years ago

cc @devmotion

devmotion commented 2 years ago

the conditioned model doesn't need to allocate y at all.

The model explicitly states though that y should be allocated: y = similar(...). Turing does not change your model implementation, if you tell it to allocate an array it will allocate one every time you run the model.

torfjelde commented 2 years ago

Avoiding allocation when unnecessary is one thing, and it can be achieved as follows:

julia> using Turing

julia> @model function regress2(x)
           α ~ Normal()
           β ~ filldist(Normal(), size(x, 2))
           σ ~ truncated(Normal(); lower=0)
           μ = muladd(x, β, α)

           if DynamicPPL.contextual_isassumption(__context__, @varname(y))
               @info "allocating"
               y = similar(vec(μ), Base.promote_eltype(μ, σ))
           else
               y = DynamicPPL.getvalue_nested(__context__, @varname(y))
           end

           for i in eachindex(μ, y)
               y[i] ~ Normal(μ[i], σ)
           end

           return y
       end
regress2 (generic function with 2 methods)

julia> x = 0:0.1:π;

julia> y = sin.(x) .+ 2 .+ randn.() .* 0.1;

julia> model = regress2(Matrix(reshape(x, :, 1)));

julia> model() == y
[ Info: allocating
false

julia> (model | (y=y, ))() == y
true

We can of course make this easier for the user by introducing some macro(s).

But regarding Zygote compatibility, this won't help since we'll still end up calling setindex! on y to make sure that the variable present in the body of the evaluation-method actually holds the correct value (at the ~ statement we don't know if this is the case).

Regarding computation of pointwise log-likelihoods, one alternative is:

julia> @model function regress2(x)
           α ~ Normal()
           β ~ filldist(Normal(), size(x, 2))
           σ ~ truncated(Normal(); lower=0)
           μ = muladd(x, β, α)

           if __context__ isa DynamicPPL.PointwiseLikelihoodContext
               @info "allocating"
               y = similar(vec(μ), Base.promote_eltype(μ, σ))

               for i in eachindex(μ, y)
                   y[i] ~ Normal(μ[i], σ)
               end
           else
               # Compatible with Zygote.jl
               y ~ arraydist(Normal.(μ, σ))
           end

           return y
       end
regress2 (generic function with 2 methods)

julia> vi = DynamicPPL.VarInfo(model | (y=y, ));

julia> pointwise_loglikelihoods(model | (y=y, ), vi);
[ Info: allocating

julia> (model | (y=y, ))();

Again, somewhat hacky but can be made easier for the user.

devmotion commented 2 years ago

It's nice to see what already can be done - but I strongly feel like nobody should have to code models in that way. Users should not use any of this internal functionality which can break at any time.

FYI the issue here is closely related to https://github.com/TuringLang/DynamicPPL.jl/issues/395. The problem/question of manually specifying or checking if something is an assumption or not reminds me also of https://github.com/TuringLang/DynamicPPL.jl/issues/390 but that is a bit more far fetched.

torfjelde commented 2 years ago

It's nice to see what already can be done - but I strongly feel like nobody should have to code models in that way. Users should not use any of this internal functionality which can break at any time.

100% agree with this of course:) But the problem is that I don't see how we can actually support this otherwise given how we also have this overall aim of not making too many changes to the body of the model.

I do however think we can make this sufficiently convenient for the user so that it actually becomes usable. For example, once you start caring heavily about performance, you often need to add these conditional statements, e.g. when you sample you want to use @addlogprob! to make things super-performant (usually related to usage of Zygote) but then when you're not running inference requiring AD you instead want a for-loop. I've unfortunately had to use this sorts of pattern in several research projects over the past 6 months to achieve both desirable performance and do the analysis we want post-inference.

Again, I agree that ideally this would all be done under the hood, but I don't see how we can achieve this with the approach we've taken in DPPL.

torfjelde commented 2 years ago

Maybe we should add some macros such as:

macro is_pointwise_loglikelihood()
    return :($(esc(:__context__)) isa DynamicPPL.PointwiseLikelihoodContext))
end

macro isassumption(variable)
    return esc(DynamicPPL.isassumption(variable))
end

Then we can do:

julia> @model demo() = @isassumption(x) ? x ~ Normal() : y ~ Normal()
demo (generic function with 2 methods)

julia> keys(VarInfo(demo()))
1-element Vector{AbstractPPL.VarName{:x, Setfield.IdentityLens}}:
 x

julia> keys(VarInfo(condition(demo(), x=1)))
1-element Vector{AbstractPPL.VarName{:y, Setfield.IdentityLens}}:
 y

I swear we've spoken about this before @devmotion; do you remember why we didn't want to do this?

One immediate aspect is that it's going to be a bit annoying to have all these macros, but, as I said before, I don't see a way around this.

EDIT: And above examples becomes

julia> @model function regress2(x)
           α ~ Normal()
           β ~ filldist(Normal(), size(x, 2))
           σ ~ truncated(Normal(); lower=0)
           μ = muladd(x, β, α)

           if @is_pointwise_loglikelihood
               @info "allocating"
               y = similar(vec(μ), Base.promote_eltype(μ, σ))

               for i in eachindex(μ, y)
                   y[i] ~ Normal(μ[i], σ)
               end
           else
               # Compatible with Zygote.jl
               y ~ arraydist(Normal.(μ, σ))
           end

           return y
       end
regress2 (generic function with 2 methods)

julia> vi = DynamicPPL.VarInfo(model | (y=y, ));

julia> pointwise_loglikelihoods(model | (y=y, ), vi);
[ Info: allocating

julia> (model | (y=y, ))();

EDIT 2: Taking it a step further (though it's a question of how many of these we want to add)

macro isobservation(variable)
    tmp = gensym(:isobservation)
    return esc(:($tmp = $(DynamicPPL.isassumption(variable)); !$tmp))
end

macro value(variable)
    vn = DynamicPPL.varname(variable)
    return esc(:($(DynamicPPL.getvalue_nested)(__context__, $vn)))
end

julia> @model demo() = @isobservation(x) ? (y = y ~ Normal(@value(x), 1), ) : (x = x ~ Normal(), )
demo (generic function with 2 methods)

julia> demo()()
(x = -1.4439334859195563,)

julia> condition(demo(), x=100)()
(y = 100.05190897117562,)

svilupp commented 1 year ago

This would be awesome to have! I'm saving all your snippets for future use:) Thank you

TuringLang / DynamicPPL.jl

Supporting mutating ADs in models that fill arrays of parameters #412