dselivanov / text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
http://text2vec.org
Other
851 stars 136 forks source link

Dynamic topic model #287

Closed manuelbickel closed 5 years ago

manuelbickel commented 5 years ago

Hi Dmitriy,

I am interested in modelling topics over time and was wondering if you are planning to include this kind of functionality in text2vec. In gensim there is a wrapper (radimrehurek.com, on github) for calling pre-compiled binaries, see github, of the DTM algorithm proposed by David Blei, see github and the paper.

I might try to write a similar wrapper in R (this would certainly take some months due to other work I have to do; also, reimplementing/integrating the algorithm via Rcpp unfortunately exceeds my skills), just wanted to check if this would be of general interest for text2vec. The algorithm is not as efficient as WarpLDA and interim files would have to be written to disk for interacting with the binary, however, it might be an interesting functionality (we might still use WarpLDA / coherence metrics for efficiently finding a good number of topics and only in second step use the DTM algorithm for this n) - just an idea I wanted to put on the list of feature requests we hopefully achieve to shorten.

dselivanov commented 5 years ago

Hi Manuel. This is definitely interesting functionality. However if it will call binary I think it might be a better idea to wrap it to a separate package (and make it seamlessly work with text2vec functionality).

manuelbickel commented 5 years ago

Alright, thanks for the quick reply. I will work into the direction of a separate package that resembles the style of text2vec and can be plugged into the text2vec workflow. I will leave a notice here accordingly if I am successful - until then I think the thread can be closed. Best regards, Manuel