Closed alexbw closed 11 years ago
Here's a better picture where I also indicate individual mice, not just the strains:
Yes, it definitely can shuffle the order of the states list; I forgot because we haven't looked at this code in a while.
This line is the culprit: https://github.com/mattjj/pyhsmm/blob/master/models.py#L220
I'm going to make a change to the pyhsmm code, then pull it through to this repo. The new semantics will be that states_list
always stays in the order of add_data
calls. Soon.
I made two fixes.
First, I only did any of the random selection stuff if numtoresample
is not set to its default, so there's no scrambling in that case: https://github.com/mattjj/pyhsmm/blob/master/models.py#L221
Second, for the numtoresample != 'all'
case, I just save the order of states_list
at the start and then restore it: https://github.com/mattjj/pyhsmm/blob/master/models.py#L233
I tested it by running examples/hsmm-parallel.py
and checking that the data
array hashes (their memory addresses) are always in the same order in states_list
.
So states_list
will always stay in the add_data
order now.
The proof's in the plot:
Looks good to me.
I have a method to get the labels from a model, stored in
self.hsmm_model
You'll note that I commented out data_ids, because at some point, it became unnecessary to keep track of the data_ids when retrieving properties. Now it seems that's been reverted.
Here's why I think that: I'm showing you a representation of around ~600K frames from the full OFA dataset. The x-axis is time. The y-axis of the various plots is explained below.
Top row is a representation of syllable usage. Dark horizontal streaks mean heavy usage of one syllable. Middle row is a plot of which mouse should be present in the dataset. Below that are simply ticks which divide the dataset into 8 even parts, meant to represent where the data was split before being handed off to the IPython clients. You'll see that those ticks, demarcate obvious breaks in the syllable usage plot. Those breaks should (from experience) fall along the mouse boundaries, not the arbitrary data boundaries. I think as the data is getting split up for the clients, the labels are not being reassembled properly.
Has anything changed in this part of the code?