dattalab / keypoint-moseq

https://keypoint-moseq.readthedocs.io
Other
64 stars 26 forks source link

Reindex syllables by frequency #72

Closed calebweinreb closed 11 months ago

calebweinreb commented 1 year ago

In the previous release of keypoint-MoSeq, we included an "extract_results" step that saved syllable sequences along with a "reindexed" version of the syllable sequences in which syllables were re-labeled by frequency (so syllable "0" was the most frequent, and so on). But this approach had a fatal flaw: when a fitted model was applied to new data, the syllable frequencies could be different, which would lead to a slightly different re-labeling, so that e.g. syllable "0" would refer to one state in a subset of recordings and a different state in another subset.

As a temporary fix for this, I removed all reindexing from the pipeline (see https://github.com/dattalab/keypoint-moseq/commit/304fcf41732cff95739f31ff1e86fb03c1e204b4 and https://github.com/dattalab/keypoint-moseq/commit/45de8a18738f12404309b6a2d85e68d5adb77dd5).

In the future though, it would still be nice to have reindexing as an option so that the syllable labels aren't a random sparse subset of numbers between 0 and 100. To make the reindexing consistent, I propose that we reindex the model itself in addition to the outputs. Here's how this would work: