bstewart / stm

An R Package for the Structural Topic Model
Other
402 stars 98 forks source link

How to speed up STM? #237

Open santoshbs opened 4 years ago

santoshbs commented 4 years ago

I have close to a million documents each with about ~200 words. Is there a way to speed up topic modeling? It has been over 30 hours and it is currently in iteration 40.

santoroma commented 4 years ago

Hello, @santoshbs, Can you specify your model? In my experience, the speed of STM depends on various factors: 1) text cleaning: removing words using stem substitute synonyms 2) complexity of the model: How many covariates? All uncorrelated or weakly correlated? Are you using factor covariates? How many levels? Sometimes, I need to handle a bit of the metadata to have better performances and more understandable results. I'm trying to run fast STM in the early steps to see if there are some words to remove/substitute. 3) Are you running on a laptop, workstation, or computation server?

I hope it can help you.