Tomotopy coherence code is unnecessarily memory intensive

chengfgao / TopicVelo

TopicVelo: Dissection and Integration of Bursty Transcriptional Dynamics for Complex Systems

BSD 3-Clause "New" or "Revised" License

16 stars 2 forks source link

Tomotopy coherence code is unnecessarily memory intensive #7

Open hdanderson opened 5 months ago

hdanderson commented 5 months ago

The current code in the tutorial for aggregating coherence at various numbers of topics is very memory intensive and can cause python to crash. This is because it aggregates all of the lda models in the lda_models array, even though lda_models is not used downstream. Removing this variable from the code should fix the issue.

hdanderson commented 5 months ago

Something like this should work for chunk 11

for k in range(len(ks)): 
    for r in range(reps):
        #perform topic modeling for a variety of topic numbers
        lda = tp.LDAModel(k=ks[k], rm_top=0)
        lda.burn_in = 50
        lda.add_corpus(corpus)
        #may want to change the parallel parameters
        lda.train(iter=150, parallel=4)
        coherence_cv[k,r] = Coherence(lda, coherence='c_v').get_score()