ContextLab / hypertools

A Python toolbox for gaining geometric insights into high-dimensional data
http://hypertools.readthedocs.io/en/latest/
MIT License
1.83k stars 160 forks source link

Support for scikit-learn's LatentDirichletAllocation and NMF, extend support for GaussianMixtureModel and BayesianGaussianMixtureModel #174

Open jeremymanning opened 6 years ago

jeremymanning commented 6 years ago

We should add support for LDA and NMF: http://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition

These can be supported both via cluster and reduce:

Cluster: behaves similarly to GaussianMixtureModel (estimate how much of each factor is reflected by/in each observation) Reduce: to reduce the data to n-dimensions, fit LDA/NMF with n components

Some sample code may be found here: https://github.com/ContextLab/storytelling-with-data/blob/master/data-stories/twitter-finance/twitter-finance.ipynb

We could play a similar trick with Gaussian mixture models-- to use a Gaussian mixture model to reduce the data to n dimensions, we could fit a GMM with n components and then use the membership mixing labels as the coordinates.