TuringLang / Turing.jl

Bayesian inference with probabilistic programming.
https://turinglang.org
MIT License
2.02k stars 218 forks source link

Custom distributions required `rand` to be implemented when "unnecessary" #907

Open torfjelde opened 4 years ago

torfjelde commented 4 years ago

One might want to use MH, HMC, etc. to sample from a distribution with unknown/intractable normalization factor rather than to sample from a posterior. In theory there's nothing prohibiting this kind of usage but it's currently not supported in Turing. Or, it is, but that requires the Distributions._rand! (for MultivariateDistribution) to be implemented either as the proposal or just to get the initial value. To me this seems quite hacky, at the very least non-intuitive.

As of right now this is not possible for custom distributions due to most (if not all) sampler initialize the VarInfo using the SampleFromUniform sampler. This means that call init(dist), which defaults to rand(dist) if the distribution is not "transformable" (the case for all custom distributions).

using Turing
struct CustomDist <: MultivariateDistribution{Continuous} end

# Distributions.jl interface except `rand`
Distributions._logpdf(d::CustomDist, x::AbstractVector{<:Real}) = exp(-norm(x) / 2)
Base.length(d::CustomDist) = 2

@model demo() = begin
    x ~ CustomDist()
end

m = demo()

# fails unless we also implement `_rand!` to get initial value
sample(m, HMC(1000, 0.1, 10))

I believe this would be solved if we had a simple way of setting the initial value for the chain, so it's very related to #746 and possibly related to the new interface PR #793 (though I haven't gotten around to having a proper look at it, so not sure).

cpfiffer commented 4 years ago

There is actually a way to set the initial parameterization in #793, but it's not really very robust:

https://github.com/TuringLang/Turing.jl/blob/bf582647adc8059cfe28fa6b9a2c3c92ee1cd1e8/src/inference/Inference.jl#L179

I think @xukai92 and I talked briefly about making this a lot easier to use in a later PR. Currently you have to pass in a vector/iterable that's the same length as vi[spl]. It's called as part of the initialization process for all the samplers if you have a keyword argument called init_theta, so in theory you could do:

sample(m, HMC(1000, 0.1, 10), init_theta=[4.0, 2.0])
torfjelde commented 4 years ago

There is actually a way to set the initial parameterization in #793, but it's not really very robust:

So in this PR we're no longer using the more robust init(dist)?

It's called as part of the initialization process for all the samplers if you have a keyword argument called init_theta, so in theory you could do:

That works:) But yeah, I guess ideally we'd have a NamedTuple to specify this?

cpfiffer commented 4 years ago

So in this PR we're no longer using the more robust init(dist)?

No, not quite -- it only does anything if you've provided a keyword argument. If you haven't then everything is initialized as before.

That works:) But yeah, I guess ideally we'd have a NamedTuple to specify this?

Yeah, I think that's probably the best way to go. init_theta=(m=1.5, s=3.2) is really intuitive.

torfjelde commented 4 years ago

No, not quite -- it only does anything if you've provided a keyword argument. If you haven't then everything is initialized as before.

Ah, sorry; misread != as ==. Two very different meanings:)

Yeah, I think that's probably the best way to go. init_theta=(m=1.5, s=3.2) is really intuitive.

That looks really good :+1: You planning on including that in #793?

cpfiffer commented 4 years ago

Probably not. It's a little bloated as it is. Me or someone else will probably get to it in another PR.

Red-Portal commented 1 year ago

@torfjelde Is this issue still relevant?