verify Viterbi correctness

dattalab / pyhsmm-library-models

library models built on top of pyhsmm

0 stars 1 forks source link

verify Viterbi correctness #2

Closed mattjj closed 11 years ago

mattjj commented 11 years ago

On real data, Viterbi is acting funny:

it totally kills the likelihood (maybe broken obs distns are contributing?)
the state sequences don't look good

Look into it!

mattjj commented 11 years ago

It definitely works on synthetic data (with left censoring and all) on 37e862a with this test file.

alexbw commented 11 years ago

Still getting funny state sequences with Viterbi. Can you remind me again how to predict state sequences without Viterbi?

These are the two methods I have, which I think use the same engine, and both are having difficulty producing quality predictions on large real datasets. Still testing, though...

def predict_MAP_state_assignments(self, X, left_censoring=False):
    self.hsmm_model.add_data(X, left_censoring=left_censoring)
    self.hsmm_model.states_list[-1].Viterbi()
    return self.hsmm_model.states_list.pop().stateseq

def predict_state_assignment_probabilities(self, X):
    self.hsmm_model.add_data(X, left_censoring=left_censoring)
    self.hsmm_model.states_list[-1].E_step()
    return self.hsmm_model.states_list.pop().expectations

mattjj commented 11 years ago

I think you can just sample state sequences in the usual way and use those. I haven't looked into Viterbi on real data (because samples have looked good), but I'll get to it soon.

alexbw commented 11 years ago

Clerical error on my part, Viterbi looks good now on my data. Had some indexes incorrectly constructed. I was failing off.

Just to verify, the specific way to sample state sequences is

self.hsmm_model.add_data(X)
self.hsmm_model.states_list[-1].resample()
return self.hsmm_model.states_list.pop().stateseq

mattjj commented 11 years ago

That's good news about Viterbi!

Yes, that's probably the best way to sample on a specific new state sequence. If it's a state sequence that already belongs to the model, the first step can be skipped and it might be able to use some cached data to be slightly faster.

alexbw commented 11 years ago

Can I close this?

mattjj commented 11 years ago

If it's working on real data, then yes!