Open asturkmani opened 7 years ago
Try comparing the hidden Markov process likelihood when instantiated randomly vs the ones instantiated from the G-Means found clusters
Compare the Anderson scores of G-Means clusters to Anderson scores of random GMMs
Objective: Uncover the generative model behind these data samples
Test:
[x] PCA dimensionality reduction
[x] Cluster with k-means
Conclusion: The principal modes only reveal information expressing which apps are mostly used. While this is interesting, it doesn't tell us much about the generative latent space. Euclidean metrics don't capture the nuances of the data.
[x] Cluster/Model with G-means G-means assumes the clusters are sampled from a latent Gaussian distribution. Consequently, each cluster represents samples from a distribution in a latent space, where each cluster indicates an underlying individual state, for example, focus on work, bored, actively checking social media, etc
[x] Cluster/Model with Gaussian Mixture Models using G-Means algorithm
[x] Use Variational Autoencoders to learn latent representation
Conclusions: Some Gaussian structure exists in the data since G-Means passes Anderson tests, however, the captured structure is still a little vague and there may be more information to be learned from dimensionality reduction techniques that account for time dependency.
[x] LSTM embedding matrix for dimensionality reduction
[ ] LSTM encoder-decoder for sequence-to-sequence modelling
[ ] LSTM encoder-decoder with attention mechanism
[x] Hidden Markov Model to infer generative Markovian process
[x] (Variational) Autoencoder with time-dependency