A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
MIT License
1.2k
stars
145
forks
source link
ctm.save crashed when training_dataset is somewhat large #110
I notice the ctm.save() method tries to save the training dataset (800k items in my case). This. however cause a crash on my machine.
I was able to resove the problem by deleting the reference to train_data in ctm.save and then modyfing the ctm.load method to pass a dataset.
In any case, it seems like storing the training dataset (except for id2token) may not be desirable in use cases where one wants to load a model to predict topics to unseen documents or continue training on a different dataset.
I notice the ctm.save() method tries to save the training dataset (800k items in my case). This. however cause a crash on my machine.
I was able to resove the problem by deleting the reference to train_data in ctm.save and then modyfing the ctm.load method to pass a dataset.
In any case, it seems like storing the training dataset (except for id2token) may not be desirable in use cases where one wants to load a model to predict topics to unseen documents or continue training on a different dataset.