bmabey / pyLDAvis

Python library for interactive topic model visualization. Port of the R LDAvis package.
BSD 3-Clause "New" or "Revised" License
1.8k stars 363 forks source link

Inappropriate usage of gensim models in gensim_models.py #204

Open jonaschn opened 3 years ago

jonaschn commented 3 years ago

The code for support of gensim models looks pretty old. I am not sure if gensim (at the time of writing this code) didn't support better means to achieve the goals this code tries to achieve.

Example: https://github.com/bmabey/pyLDAvis/blob/8e534a6e1852ef4674ef9a45223e8c6a931db2e6/pyLDAvis/gensim_models.py#L24-L29

The LDA model does not offer the beta parameter because it is called eta in gensim. Furthermore, the gensim's Dictionary offers the term frequency (across the collection) as model.id2word.cfs and document frequency model.id2word.dfs (in how many documents the term occurs).

msusol commented 1 year ago

Need to take a look at this after 3.4.0 release

https://radimrehurek.com/gensim/models/ldamodel.html https://stackoverflow.com/questions/66111075/tuning-lda-topic-models https://www.analyticsvidhya.com/blog/2021/06/part-3-topic-modeling-and-latent-dirichlet-allocation-lda-using-gensim-and-sklearn/