Closed Supermaxman closed 1 year ago
thank you so much for this! I'll go over the changes hopefully tomorrow and merge them!
renamed log sigma to log var. I know log sigma is more coherent than var but the variable is referred as log_var
in the rest of the code; i'd keep it like this for consistency.
merged!
I regularly ran out of memory on large datasets during the CTM fit call. Upon further inspection, I found the automatic generation of training_doc_topic_distributions with get_doc_topic_distribution after the fit call to have many opportunities for improved memory usage. I made the following changes:
I do not believe any of the changes I made are breaking changes, these should simply make topic discovery run faster and with less memory.
These efficiency gains enabled me to run the CTM on a massive collection of tweets, so I thought I would create a pull request and offer these improvements back to the original repo, as it still seems active.