Open hdanderson opened 5 months ago
Something like this should work for chunk 11
for k in range(len(ks)):
for r in range(reps):
#perform topic modeling for a variety of topic numbers
lda = tp.LDAModel(k=ks[k], rm_top=0)
lda.burn_in = 50
lda.add_corpus(corpus)
#may want to change the parallel parameters
lda.train(iter=150, parallel=4)
coherence_cv[k,r] = Coherence(lda, coherence='c_v').get_score()
The current code in the tutorial for aggregating coherence at various numbers of topics is very memory intensive and can cause python to crash. This is because it aggregates all of the lda models in the
lda_models
array, even thoughlda_models
is not used downstream. Removing this variable from the code should fix the issue.