OrderedBijector gives surprising results

rikhuijzer commented 2 years ago

When I use the OrderedBijector, the prior plots from Turing are surprising. Maybe, I'm misunderstanding though. What I try to do is to get an ordered MvNormal prior mu with means [22, 22] (or so). The source for the ordered function is https://github.com/TuringLang/Turing.jl/discussions/1535.

ordered(d::Distribution) = transformed(d, OrderedBijector())

@model function ordered_mu()
    means = [10, -10]
    mu ~ ordered(MvNormal(means, 2))
end

chns = sample(ordered_mu(), Prior(), 100)
chns[["mu[1]", "mu[2]"]] |> mean

gives 22 and 1.89e10 for mu[1] and mu[2] respectively.

Other values for means give the following outcomes:

`means`	outcome
`[0, -10]`	-0.07, -0.07
`[-10, 10]`	-10.2, 1.16e5
`[-20, -20]`	-19.9, 4.8e9

Dependencies: Bijectors 0.9.11 Turing 0.19.0

torfjelde commented 2 years ago

I'm a bit uncertain what your expectation is. Do you expect ordered(MvNormal(ones(2), I)) to result in a distribution with mean ones(2) but that is ordered? If that's the case, this is not what OrderedBijector is for. It also doesn't make sense to have ask for a distribution which is supposed to be ordered while simultaneously have mean where the components are equal, e.g. [22, 22]).

rikhuijzer commented 2 years ago

I just figured it out, I think. At https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html they also use an ordered prior with the same means:

data {
 int<lower = 0> N;
 vector[N] y;
}

parameters {
  ordered[2] mu;
  real<lower=0> sigma[2];
  real<lower=0, upper=1> theta;
}

model {
 sigma ~ normal(0, 2);
 mu ~ normal(0, 2);
 theta ~ beta(5, 5);
 for (n in 1:N)
   target += log_mix(theta,
                     normal_lpdf(y[n] | mu[1], sigma[1]),
                     normal_lpdf(y[n] | mu[2], sigma[2]));
}

However, the data here is centered around zero. I just did that with my data and it works much better (maybe even perfect)

Sorry for bothering you

torfjelde commented 2 years ago

However, the data here is centered around zero. I just did that with my data and it works much better (maybe even perfect)

I'm a bit uncertain why data is relevant here, but prety certain the above results in the same behavior: https://mc-stan.org/docs/2_27/reference-manual/ordered-vector.html

torfjelde commented 2 years ago

I might also add that if you want to ensure that the mean of the resulting ordered distributions has a particular mean, you can do the following:

julia> μ = inv(OrderedBijector())([0.0, 5.0]) # I want [0.0, 5.0] as mean of ordered
2-element Vector{Float64}:
 0.0
 1.6094379124341003

julia> mean(rand(ordered(MvNormal(μ, I)), 10000); dims=2)
2×1 Matrix{Float64}:
 -0.0023004526195303846
  8.218786606485972

Notice that it's still not quite correct because the correlation is not I, but it at least gets you closer to what you want.

EDIT: I think the best approach if you want a particular mean is to just try, e.g.

julia> μ = inv(OrderedBijector())([0.0, 3.0]) # I want [0.0, 5.0] as mean of ordered
2-element Vector{Float64}:
 0.0
 1.0986122886681098

julia> mean(rand(ordered(MvNormal(μ, I)), 10000); dims=2)
2×1 Matrix{Float64}:
 -1.597408661059063e-5
  4.9872350833635535

rikhuijzer commented 2 years ago

Thank you for the helpful example! :smiley:

I couldn't figure it out in the end. The Stan model uses a log transformation, so maybe that's why ordering works there and didn't work for me.

Anyway, I did solve my problem with the label switching in mixture models by setting non-overlapping priors. I've described it in detail at https://huijzer.xyz/posts/latent/ including a run with VI. Let me know if you spot any mistakes :smile:

rikhuijzer commented 2 years ago

After trying 40 different combinations of MixtureModel, Bijectors.ordered and MvNormals and I don't know what not, I think that I got a working ordered MixtureModel implementation. Writing it down in this issue before it gets lost:

For a mixture with k components and for each component having the prior for the distributions locations at zero

inv_means(k::Int) = inv(Bijectors.OrderedBijector())(range(0, step=0.1, length=k))

@model function mixture_model(k::Int, X, y)
    w ~ Dirichlet(k, 1)

    μ ~ ordered(arraydist([Normal(m, σ) for m in inv_means(k)]))

    n = length(y)
    y ~ filldist(MixtureModel(Normal, μ, w), n)
end

Here, inv_means uses range because taking only zeros, that is fill(0, k), doesn't work.

EDIT: Updated text after k EDIT2: Added ordered behind μ.

torfjelde commented 2 years ago

I'm a bit confused here. What do you mean "centered around zero"?

rikhuijzer commented 2 years ago

I'm a bit confused here. What do you mean "centered around zero"?

Good question. I should have said: "with the prior for each component's location around zero".

torfjelde commented 2 years ago

So that's what I thought you meant, but that's not the case here, right?

julia> inv_means(k::Int) = inv(Bijectors.OrderedBijector())(range(0, step=0.1, length=k))
inv_means (generic function with 1 method)

julia> inv_means(2)
2-element Vector{Float64}:
  0.0
 -2.3025850929940455

rikhuijzer commented 2 years ago

I've updated the post above again, μ ~ arraydist([Normal(m, σ) for m in inv_means(k)]) should have been μ ~ ordered(arraydist([Normal(m, σ) for m in inv_means(k)])).

julia> ordered_normals = Bijectors.ordered(arraydist([Normal(m, 0.2) for m in inv_means(2)]));

julia> rand(ordered_normals, 10)
2×10 Matrix{Float64}:
 0.411303  -0.169351   -0.0362549  -0.00977221  …  0.304684  -0.14895   0.177662
 0.520854  -0.0954574   0.043032    0.0747526      0.397341  -0.033381  0.263273

julia> mean(rand(ordered_normals, 10_000); dims=2)
2×1 Matrix{Float64}:
 -0.0008690385777423463
  0.1010091207557524

I'm now working on updating the aforementioned blog post. I'll let you know if that works.

EDIT: It should work because something similar works in Stan too.

rikhuijzer commented 2 years ago

Actually. I think it did work!

Data

So, note that μ[1] and μ[2] should be around 22 and 28.

Model

inv_ordered(X::Vector) = inv(Bijectors.OrderedBijector())(X)

# Prior around 25, setting the second one to 25.01 for technical reasons.
M = inv_ordered([25, 25.01])

@model function ordered_prior(k::Int, Y)
    w ~ Dirichlet(k, 1)

    μ ~ ordered(arraydist([Normal(m, 4) for m in M]))

    n = length(Y)
    Y ~ filldist(MixtureModel(Normal, μ, w), n)
end;

Sampling

ordered_hmc_posterior = let
    rng = StableRNG(1)
    sampler = HMC(0.01, 20)
    sample(rng, ordered_model, sampler, MCMCThreads(), n_samples, 3)
end;

These mixed models are super tricky for the sampler, NUTS goes into an infinite loop (i.e., goes nuts) and VI gives an estimate of 2 and 25 for the mean, but after fiddling a bit with the sampler parameters, HMC actually is able to figure it out reliably (I've tested against multiple seeds) :partying_face: :tada:

torfjelde commented 2 years ago

I've updated the post above again, μ ~ arraydist([Normal(m, σ) for m in inv_means(k)]) should have been μ ~ ordered(arraydist([Normal(m, σ) for m in inv_means(k)])).
julia> ordered_normals = Bijectors.ordered(arraydist([Normal(m, 0.2) for m in inv_means(2)]));

julia> rand(ordered_normals, 10)
2×10 Matrix{Float64}:
 0.411303  -0.169351   -0.0362549  -0.00977221  …  0.304684  -0.14895   0.177662
 0.520854  -0.0954574   0.043032    0.0747526      0.397341  -0.033381  0.263273

julia> mean(rand(ordered_normals, 10_000); dims=2)
2×1 Matrix{Float64}:
 -0.0008690385777423463
  0.1010091207557524
I'm now working on updating the aforementioned blog post. I'll let you know if that works.

EDIT: It should work because something similar works in Stan too.

But this is not 0-mean; it's [0.0, 0.1] :sweat_smile: That's why I asked for the clarification earlier: even if you meant to do this (which I had suspected you might be), this wouldn't lead to exactly 0-mean but rather something "close" (by increments of 0.1) and so I was thinking "maybe he means 'approximately' 0 when he says 'around 0'":)

But I'm with you that for practical purposes this can be good enough, as demonstrated in your example :+1: My point has just been that it doesn't make sense to ask for iid random variables with the same mean to also be ordered:)

Anyways, glad you managed to get things working!

rikhuijzer commented 2 years ago

But this is not 0-mean; it's [0.0, 0.1] :sweat_smile:

Okay, fair enough!

But I'm with you that for practical purposes this can be good enough, as demonstrated in your example +1 My point has just been that it doesn't make sense to ask for iid random variables with the same mean to also be ordered:)

You're right :+1:

Anyways, glad you managed to get things working!

Thanks! Couldn't have done it without your help! :heart: I wouldn't have figured out to use inv first. Thanks a lot!

Posterior now looks like

and I've updated it in the post.

TuringLang / Bijectors.jl