What is being learned in "catch-all" states?

alexbw commented 11 years ago

@mattjj, do this issue go in pymouse?

There are syllables which use a great number of substates. Often, there's just one, in my so far limited experience using this model.

They're deployed often, and for long periods of time

And if you look at how it's being used, the model throws it out when the data is difficult or ambiguous.

I'm fine with catch-all states like this existing. I think it's catching noise and difficult data, and putting it all in one bin for us. What do you think?

mattjj commented 11 years ago

This issue should probably be in pymouse, since it has to do with fitting to mouse data (and not a general thing with library subHMMs), but the distinction probably isn't too important operationally.

In your last plot, what is white and what is black?

How many iterations did you run this for? How many independent runs? (I think the answer to the latter is probably 1, but it really shouldn't be; there's no such thing as a result without multiple independent runs!)

Generally speaking, the nature of the syllables fit is a function of the priors (including model specification and hard constraints); that is, it's how we instantiate our hypothesis.

Given the model specification and hard constraints, we can adjust our hypothesis by adjusting the hyperparameters. If we set up the model so that arbitrarily long durations are easy and instantiating subHMMs is really expensive (small concentration parameters for the super-transition matrix), then any fitting procedure would tend to assign more stuff into a single subHMM. Therefore we can push it in the other direction using the hyperparameters (remember that the ultimate model fit criterion, held-out likelihood, doesn't care about priors; it's just to get a potentially better fit). Specifically, we can adjust concentration parameters and the hyperparameters on durations.

The hypothesis is also instantiated in the model structure itself. Concretely, that means the tendency to create a "catch-all" state should be compared to the same tendency in the LHSMM. However, since this subHMM model can simulate an LHSMM, it seems improbable that we won't be able to set the soft component of our priors (i.e. the hyperparameters discussed in the previous paragraph) to eliminate "catch-all" states to the same degree we can avoid them in the LHSMM.

Finally, all of that is predicated on the idea that these effects aren't local optimum effects. It may be that, given our current priors, there are lower energy configurations that the sampler hasn't found yet. That's very probable and should be explored first. To get at that question, we should first run independent initializations and look at the robustness of these effects, and then (given that robustness) we might try some temperature schedules.

tl;dr same story as last few days, adjust priors accordingly, ALWAYS run multiple times and report iteration count for sampler fits

alexbw commented 11 years ago

Closing, moved to pymouse repo

dattalab / pyhsmm-library-models

What is being learned in "catch-all" states? #46