HeathRossie / PARAFAC-GMM-Pipeline

A pipeline for unsupervised characterization of time-series data
1 stars 0 forks source link

PARAFAC-GMM-Pipeline

A pipeline for unsupervised characterization of time-series data

Automatic clustering of behaviors is increasingly important analytic pipeline. This repo shows a demonstration of combiation of parallel factor analysis + Gaussian mixture model.

The classification of trials may be useful to answer several research questions. For example,

A common analytic pipeline in computational ethology classfies the moment-to-memoment behavioral states by clustering frame-by-frame features. The current method utilizes a bit different strategy; it characterizes whole trials sequences, which can be multiple time-series as a behaviral patterns.

(0) Data

Imagine you got multiple time-series data like these image

These are hypothetical data of several trials. Red and blue lines are supposed as features obtained in an experiment. For example, these may be x and y positional data from tracking, distance metrics, orientations, or movement velocity.

Problem setting: As you can see, trial 14, 18, and 58 show similar movement patterns. One may would like to classify these behaviour into discrete clusters, which I call "behaviral patterns".

(1) Dimensionality reducntion

The pipeline demostrated in this repo project time-series features in a trial into one location of abstract features space, using parallel factor analysis. Briefly, parallel factor analysis is used for dimensinality reduction, since each trial are multi-dimensional data, namely time x features.

image

Colours represent the true classes generated in a demo-data.

(2) Clustering Behavioral patterns

Obviously, true classes are unknown in a real reserach. Thus, we need to estimate the classes by clustering method. Here, using Gaussian Mixture model, the classes are automatically detected.

image

The number of patterns can be inferred by BIC. But note that BIC tend to be unnecessarily increase as a function of the number of the patterns in GMM. It may be recommendable to use a threshold as a cut-off point.

image

(3) Visualization of each behaviral pattern

Finally, it would be informative to visualize each behavioural sequences.

Because this demonstration is not from real data, trajectories are not interpretable. However, you would get good inspection from visualization of your own real dataset.

image