Open nsgrantham opened 2 years ago
The problem is that the Turing model has an error; it is using MvNormal
to mean a sequence of independent, normally-distributed values.
However, MvNormal
is supposed to be used when each observation is a vector of correlated observations. For example, if I want to predict each person's height and weight (which are correlated), I could say [height, weight] ~ MvNormal([6 ft, 150 lbs], Σ)
. As written, your model says "I have a single observation consisting of one really long vector of uncorrelated features." When this single observation is removed, the model breaks because it's being trained on a sample of 0 while trying to predict the whole sample.
Rewriting the Turing model like this should fix the problem:
@model function adult_happiness(h, a)
α ~ Normal(0, 1)
β ~ Normal(0, 2)
σ ~ Exponential(1)
μ = α .+ β .* a
@. h ~ Normal(μ, σ)
end
(Don't worry, this is a very common mistake -- common enough that I should probably add an explanation when the model has a sample size of 1 :smile:)
Thanks for the quick response and the fix, @ParadaCarleton. I understand the source of the problem now. This is my first time encountering the @.
macro, very useful.
I'm not sure I would say that model has an error, however. Using the MvNormal
distribution with a mean vector and variance scalar as a "shorthand" for a vector of independent Normal
distributions is a convention that Turing's own documentation uses (see the Linear Regression tutorial). I also recall seeing somewhere (perhaps in an issue that I can't seem to find again) that using MvNormal
in this way, rather than for i in 1:N; h[i] ~ Normal(...); end
, is better for the sampler from a computational perspective.
In any case, I can see how using MvNormal
in this way would confuse psis_loo()
because it isn't clear from the code alone whether this is a single observation or multiple independent observations.
And yes, changing h ~ MvNormal(μ, σ)
to @. h ~ Normal(μ, σ)
resolves the issue.
I'm not sure I would say that model has an error, however. Using the
MvNormal
distribution with a mean vector and variance scalar as a "shorthand" for a vector of independentNormal
distributions is a convention that Turing's own documentation uses (see the Linear Regression tutorial).
I believe @torfjelde commented on this by saying:
IMO it does go against the semantic meaning of MvNormal: it represents a single multivariate random variable, not a collection of such. Hence if you want a collection of independent observations, then you should use .~. Note that both the LHS and RHS of .~ can be arrays, e.g. [1.0, 2.0] .~ [Normal(), Normal()] also works.
I also recall seeing somewhere (perhaps in an issue that I can't seem to find again) that using
MvNormal
in this way, rather thanfor i in 1:N; h[i] ~ Normal(...); end
, is better for the sampler from a computational perspective.
Broadcasting (.~
) is definitely faster than using a loop, but I think that broadcasting should be just as fast as using MvNormal
. Don't quote me on that though.
That being said, I'm working on a PR that will fix this by creating syntax that lets users unambiguously specify what they mean by an independent observation.
I am encountering an issue where if I pass a Turing model of type
DynamicPPL.Model
topsis_loo()
it returns values of-Inf
orNaN
.However, when I pass it a log-likelihood function and the data as an array of tuples, it behaves "correctly" (in the sense that it returns finite values, but I have not independently verified that they are accurate).
The following is a reproducible example.
Now, reading through the code in
src/TuringHelpers.jl
in this repo, it seems as though there's a method that supports Turing models directly. However, if I runthen I get
Instead, I can define the appropriate log-likelihood function and provide the data in an array of tuples:
And this yields something that seems correct.
I would expect these two approaches to yield the same result.
Am I simply using
psis_loo()
incorrectly when passing it a Turing model of typeDynamicPPL.Model
? The language thatResults of PSIS-LOO-CV with 10000 Monte Carlo samples and 1 data points
seems to suggest that it does not recognize thatmodel
contains multiple data points.For now I will continue to use the
ll_fun
approach, but wanted to call this out here in case there is something that needs a closer look inTuringHelpers.jl
.