MilaNLProc / contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
MIT License
1.21k stars 147 forks source link

How do I use a Kitty trained model with pyLDAvis? #116

Closed marcmaxson closed 2 years ago

marcmaxson commented 2 years ago

Description

I'd like to look at the results of my model, trained using Kitty, in pyLDAvis. It appears like one of the required inputs, training_dataset is not retained in the model, so I can't pass them into this function:

lda_vis_data = kt.ctm.get_ldavis_data_format(kt.qt.vocab, training_dataset, n_samples=1)

I looked at your source and this training_dataset is created and used, but not stored. Is there a different way to generate the interactive chart? And what is the kt.ctm.get_ldavis_data_format method for if it cannot be supplied with the data?

I realize I can use CTM to get this, but wanted to make sure I didn't miss some way to use Kitty as well.

vinid commented 2 years ago

Hello @marcmaxson!

yes you are right, there is no way to easily do this inside of kitty.

One thing I can do is store the dataset in the kitty object, in this way it should be possible to use that to create the ldavis plot.

Let me know if you want me to add this (or if you have done it, PRs are super welcome!)

marcmaxson commented 2 years ago

Yes, please add it. It doesn't need to default to saving this, as it would make the model bigger / use more memory. But if this can be used with pyLDAviz, I think more people would use it.

vinid commented 2 years ago

ok it's probably not the best fix but now kitty has an option to return the dataset used for training