TuringLang / docs

Documentation and tutorials for the Turing language
https://turinglang.org/docs/
MIT License
231 stars 99 forks source link

gaussian_mixture_model takes a very long time to sample #294

Closed wetlandscapes closed 2 years ago

wetlandscapes commented 2 years ago

Hi, Thank you very much for the tutorials!

I was super impressed with the speed of the coinflip model in the 00_introduction tutorial, which took microseconds to fit after being compiled (~10 seconds). However, now that I've moved onto the 01_gaussian-mixture-model, I am running into some strange behavior. That is, even with only 100 samples, the gaussian_mixture_model model takes about 25 minutes to sample on my machine. I'm not running the chains in parallel (so let's call it about 8 min/chain to fit, since I'm using the default of 3 chains in the tutorial), but that still seems really slow. Am I missing something? Or is that kind of timing expected? It doesn't seem to be a "time to first plot" problem, because I get the same timing running sample a second time, after compilation. Happy to share details about my environment or whatever, if that helps.

Sidebar: It is so freaking cool that it is possible to combine PG and HMC (and presumably NUTS) into a single sampler! Some of my colleagues have grumbled a lot about Hamiltonian-based samplers not working with discrete distributions, so this seems like it could solve some problems.

rikhuijzer commented 2 years ago

Sidebar: It is so freaking cool that it is possible to combine PG and HMC (and presumably NUTS) into a single sampler! Some of my colleagues have grumbled a lot about Hamiltonian-based samplers not working with discrete distributions, so this seems like it could solve some problems.

Agreed. The flexibility of Turing is great.

That is, even with only 100 samples, the gaussian_mixture_model model takes about 25 minutes to sample on my machine.

A while ago, I've used filldist(MixtureModel(...)) instead of a loop over MvNormal. For example, I wrote

w ~ Dirichlet(k, 1)

μ ~ filldist(Normal(25, 4), 2)

n = length(Y)
Y ~ filldist(MixtureModel(Normal, μ, w), n)

instead of

D, N = size(x)
μ1 ~ Normal()
μ2 ~ Normal()

μ = [μ1, μ2]
w = [0.5, 0.5]

k = Vector{Int}(undef, N)
for i in 1:N
    k[i] ~ Categorical(w)
    x[:,i] ~ MvNormal([μ[k[i]], μ[k[i]]], 1.)
end
return k

This should be quicker as mentioned in the Performance Tips. For more information about my model see https://huijzer.xyz/posts/latent/ or https://discourse.julialang.org/t/variational-inference-of-mixture-models/40031/2.

Having said that, it shouldn't take 25 minutes if I remember correctly. What CPU and Julia version are you running?

julia> Sys.cpu_summary()
[...]

julia> VERSION
[...]
wetlandscapes commented 2 years ago

Hi, Thanks for the quick response!

It looks like the reparameterization you provide is for univariate data, as opposed to the bivariate conditions provided in the tutorial. Being relatively new to Julia and Turing (hence why I'm using the tutorials -- two birds, one stone), I'm not yet sure how to adapt your proposal. I'm trying to read through the documentation (the Distributions.MixtureModel function seems particularly relevant), but coming from R, I'm still learning how to unpack all the help information for the various signatures (honestly, one of the reasons I'm here is for the multiple dispatch). So, please forgive my ineptness.

If this isn't something you want to deal with just let me know and I can close the issue, but I figured someone should at least be aware that there may be something funky going on with the tutorial I mentioned.

Here's my CPU and Julia info:

julia> Sys.cpu_summary()
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz: 
          speed         user         nice          sys         idle          irq
#1-12  3299 MHz  181362480            0    296653306    4358857948      3354744  ticks

julia> VERSION
v"1.7.0"
devmotion commented 2 years ago

Did you run the latest version of the tutorial? It's not published on the webpage yet but you can find the script in the repo here and e.g. markdown and html versions in https://github.com/TuringLang/TuringTutorialsOutput/blob/main/markdown/01-gaussian-mixture-model/01_gaussian-mixture-model.md.

wetlandscapes commented 2 years ago

Yes, I cloned the main tutorials repo just a couple days ago (after the last commit, which mentions updating the Gaussian mixture model).

Also, I just tried to run code from the previous commit, to see if maybe I would have better luck. The first option indicated in the model code (using w = [0.5, 0.5]) threw a MethodError related to a very long list of types (something about phasepoint). The second option (sampling w as a Dirichlet) ran in only a couple minutes, but the estimates of both μ parameters appeared to be invariant, which is not good. So, no idea what's going on.

FYI, so far I've gotten through tutorials 00-02, and only the mixture model seems to be having this issue.

wetlandscapes commented 2 years ago

Also, super cool that there is an output repo. When I first started playing with the tutorials I was thinking it would be nice to have some pre-built versions to peruse while I ran the code in a notebook. So, thanks for pointing that out!

wetlandscapes commented 2 years ago

I thought it might be worth investigating other composite samplers, as so far the "typical" samplers seem to be running as quickly as I'd expect. Playing around with the 04_hidden-markov-model tutorial, which combines HMC and PG samplers with a call to Gibbs, it appears this tutorial, too, is running much slower than anticipated. That is, it takes about 30 minutes to sample the hidden markov model.

This is not definitive, obviously, but it is at least suggestive of where the issue might lie.

devmotion commented 2 years ago

Ah I hadn't realized before that it uses PG - that explains a lot. The recent rewrite of Libtask in pure Julia (previously it used a Julia version specific C library which caused a lot of problems) causes significant performance regressions. There's an issue in the Turing repo and there's ongoing work to fix these problems. I assume everything involving PG and SMC will be much much faster if you use an older C version of Libtask which, however, also requires an older version of Turing IIRC.

wetlandscapes commented 2 years ago

Ah, okay. So, it sounds like it may just take a bit for PG to be optimized for Julia, meaning these may just be the growing pains of an actively developed and maintained library (which is a good sign to me).

It sounds like I can close the issue, since it is a known problem. Yeah?

rikhuijzer commented 2 years ago

It sounds like I can close the issue, since it is a known problem. Yeah?

The slowness of particle-based samplers is known yes, so then I guess you can close it indeed. The canonical issue is at https://github.com/TuringLang/Turing.jl/issues/1774.

wetlandscapes commented 2 years ago

Thanks!