ctm.save crashed when training_dataset is somewhat large

I notice the ctm.save() method tries to save the training dataset (800k items in my case). This. however cause a crash on my machine.

I was able to resove the problem by deleting the reference to train_data in ctm.save and then modyfing the ctm.load method to pass a dataset.

In any case, it seems like storing the training dataset (except for id2token) may not be desirable in use cases where one wants to load a model to predict topics to unseen documents or continue training on a different dataset.

MilaNLProc / contextualized-topic-models

ctm.save crashed when training_dataset is somewhat large #110