andrewtavis / kwx

BERT, LDA, and TFIDF based keyword extraction in Python
BSD 3-Clause "New" or "Revised" License
70 stars 10 forks source link

Update gensim LDA to 4.X #21

Closed andrewtavis closed 3 years ago

andrewtavis commented 3 years ago

This issue is for discussing and eventually implementing an update for gensim implementations of LDA in kwx. The package was originally written with 3.X versions of gensim, and 4.X versions apparently have some dramatic improvements as far as modeling options/efficency and n-gram creation (for kwx.utils.clean). Changes would need to be made in kwx.utils, kwx.model, and kwx.topic_model.

Documenting what would need to happen for the switch and then work towards implementing it would be very much appreciated :)

Thanks for your interest in contributing!

andrewtavis commented 3 years ago

Guide for how to migrate: https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4