TuringLang / docs

Documentation and tutorials for the Turing language
https://turinglang.org/docs/
MIT License
230 stars 99 forks source link

Bayes HMM #4

Closed cpfiffer closed 5 years ago

cpfiffer commented 6 years ago

We already have an example for this one elsewhere, but it needs a lot more commentary and definition. This one is at the moment slightly outside of my comfort zone to expand on, so it might need to fall to someone else for now or wait until I can read up on the subject a bit more.

Original: BayesHMM.ipynb

trappmartin commented 6 years ago

Maybe @hessammehr help you improving this notebook so that it is a user-friendly tutorial. I'm happy to help too.

What questions regarding the model do you have?

cpfiffer commented 6 years ago

I'm not quite sure yet. I have some stuff to read on it and we'll see if I have any questions after that.

cpfiffer commented 5 years ago

@yebai, would it be alright if I took this one over?

yebai commented 5 years ago

Sure, please give it a try and I will be happy to help along the way.

cpfiffer commented 5 years ago

After sampling from the model, it doesn't appear that the emission parameter (which I believe is the m parameter in the model specification) changes in any sample. Here's the output from c[:m]:

Edit: It appears this is due to the HMC specs in the sample call. Changing the epsilon parameter from 0.2 to 0.01 seems to cause the values to change.

500-element Array{Array{Any,1},1}:
 [1.82431, 1.81718, 2.42158]
 [1.82431, 1.81718, 2.42158]
              ...
 [1.82431, 1.81718, 2.42158]
 [1.82431, 1.81718, 2.42158]

My understanding is that this parameter should update across samples -- is that not true? I would expect m to move closer to something like [1.0, 2.0, 3.0] given the specification on our training set.

Here's the model and the sample call for quick reference:

y = [ 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 1.0, 1.0 ];
N = length(y);  K = 3;

@model BayesHmm(y) = begin
    # Set up the hidden state array.
    s = tzeros(Int, N)

    # State emission parameters.
    m = Vector{Real}(undef, K)

    # Define the state transition matrix.
    T = Vector{Vector{Real}}(undef, K)

    # Define our priors.
    for i = 1:K
        # Our transition matrix prior is equally weighted.
        T[i] ~ Dirichlet(ones(K)/K)

        # Set the emission parameter prior.
        m[i] ~ Normal(i+1, 0.01)
    end

    s[1] ~ Categorical(ones(Float64, K)/K)
    for i = 2:N
        s[i] ~ Categorical(vec(T[s[i-1]]))
        y[i] ~ Normal(m[s[i]], 0.4)
    end
    return(s, m)
end

g = Gibbs(500, HMC(1, 0.2, 3, :m, :T), PG(10, 1, :s))
c = sample(BayesHmm(y), g);
cpfiffer commented 5 years ago

@yebai or @trappmartin (or anyone, I guess) would one of you take a look at the second example I have in the BHMM model? I want to get the transition matrix closer to the true matrix but I seem to be missing something.

yebai commented 5 years ago

@cpfiffer This might be due to the infamous label-switching problem for mixture models. The following paper explains this issue

Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling A. Jasra, C. C. Holmes, and D. A. Stephens https://projecteuclid.org/euclid.ss/1118065042

To solve this issue, you might need to apply a more information prior to the emission parameters.

cpfiffer commented 5 years ago

The pull request #15 is now hopefully ready to go, so please check it out before I publish it to the site.

yebai commented 5 years ago

@cpfiffer Looks good except some plots for the FairCoin example is missing.

cpfiffer commented 5 years ago

I've revisited this trying to get a plot I like, but I think I might have found that I never solved the label-switching problem. I can't get the transition matrix to converge somewhere reasonable. I've tried all kinds of priors on the emission and transition matrices, and I've even tried just giving the model the true coin flip probabilities. I don't know where to go from here.

Could someone take a look at my second example and see if there's an obvious place where I'm going wrong? Here is a link to the current notebook.

yebai commented 5 years ago

Closed by https://github.com/TuringLang/TuringTutorials/pull/15