MIND-Lab / OCTIS

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
MIT License
718 stars 102 forks source link

Adding top_words parameter to CTM model #84

Closed arijitgupta42 closed 1 year ago

arijitgupta42 commented 1 year ago

The train_model function in CTM has the top_words parameter but it doesn't get passed to the ctm class as it doesn't have any argument corresponding to the same. This results in CTM returning topics of length 10 regardless of the values of top_words in the train_model function. For example the below code will return topics of length 10 even though we've set the value of top_words to 5.

from octis.models.CTM import CTM
from octis.dataset.dataset import Dataset

dataset = Dataset()
dataset.fetch_dataset("M10")

model = CTM(num_topics=10)
output = model.train_model(dataset, top_words=5)
npmi = Coherence(texts=dataset.get_corpus(), topk=5)

This PR adds the top_word parameter to the ctm class and modifies the get_topics function to use the value of top_words passed by the user (default value is still 10)

silviatti commented 1 year ago

Hello, thanks for your PR. I have made a few changes:

I still have to fix a few things on my side - which are unrelated to your PR -, but I'll merge the PR after that.

Thanks again for contributing :)

Silvia