The has code that parallels the article Using Metrics to Determine The Right LDA Topic Model Size. Users can run the notebook and step-by-step re-create the procedures described in the article.
To run the code presented here, follow this outline (details in the cells below):
There are three csv files that are needed to run this notebook:
In the GitHub repository:
On Kaggle
ModelRunMetrics are the metrics from 90 runs of the LDA and can be used to re-create and explore the data from the article.
NewsDF is a copy of the 30,000 article DB that has both the original text as well as pre-processed versions of the articles. You will need this if you want to run your own models AND if you want to explore the text that the models are built on.
It is recommended that you place all of these files in a location accessible to the Colab notebook and referenced in the DATA_DIR variable