Closed devanshrj closed 1 year ago
Hi, CTM version in OCTIS is not the latest and at the moment we have no plans on updating it. If you see in the original repo, there have been some improvements to support larger datasets: https://github.com/MilaNLProc/contextualized-topic-models/pull/124 I'd suggest you use that repo directly.
Hope this helps,
Silvia
Description
I want to train CTM on a dataset containing approximately 4 million tweets (with a vocabulary size of approximately 20,000). I get the following error message from the
train_model()
function:numpy.core. _exceptions.MemoryError: Unable to allocate 735. GiB for an array with shape (4344759, 22716) and data type int64
Is there a way to optimize the training process or incrementally train the model (similar to online topic modeling in BERTopic)?
What I Did
Command I ran:
Traceback: