asturkmani / Thesis

Repository for my MSc Thesis at the University of Oxford
1 stars 0 forks source link

Unsupervised modeling of data #1

Open asturkmani opened 7 years ago

asturkmani commented 7 years ago

Objective: Uncover the generative model behind these data samples

  1. Assumption 1: The latent states can be uncovered from the principal modes of app usage behavior.

Test:

Conclusion: The principal modes only reveal information expressing which apps are mostly used. While this is interesting, it doesn't tell us much about the generative latent space. Euclidean metrics don't capture the nuances of the data.


  1. Assumption 2: The data points are sampled from a space that captures the individual's mental state. This space has a Gaussian structure, i.e. each state in the space from which the data points are sampled has a Gaussian structure and is distributed N(m,s) such that m = E(x_i | x_i in X_1) and s^2 = E(X^2) - E(X)^2

Conclusions: Some Gaussian structure exists in the data since G-Means passes Anderson tests, however, the captured structure is still a little vague and there may be more information to be learned from dimensionality reduction techniques that account for time dependency.


  1. Assumption 4: The Gaussian space from which data points are sampled is actually a generative latent space in lower dimensions. The latent space behaves as follows: y_i = generated(x_i | x_i in X & X = N(E(x_i),s) where X in Z s.t Z = latent space.
  1. Assumption 5: The latent space from which the data points are sampled possess a time-dependency structure that static Gaussian Mixture Models & G-Means aren't able to capture. The space behaves as follows: p(y_n+1) = p(y_n+1 | y_n..0)
  1. Assumption 5: The latent space from which the data points are generated from a latent space with both a Gaussian and time-dependency structure. y_i = Ax_i + Bx_(i-1)
asturkmani commented 7 years ago

Try comparing the hidden Markov process likelihood when instantiated randomly vs the ones instantiated from the G-Means found clusters

Compare the Anderson scores of G-Means clusters to Anderson scores of random GMMs