Closed SafetyMary closed 3 months ago
Considering the way you used .update_topics
, this is expected behavior. What you are doing is overwriting the representation models when you run .update_topics
since you did not provide it with the representation models. Instead, you left that as the default (which is None) and therefore the default c-TF-IDF representation are used.
You should do the following instead:
# Representation model
generator = pipeline('text2text-generation', model='../../pretrain_models/flan-t5-base') # I used offline model here
representation_model = TextGeneration(generator)
representation_model = {
"Main": TextGeneration(generator),
"Aspect1": TextGeneration(generator),
"Aspect2": TextGeneration(generator)
}
# Run model
topic_model = BERTopic(nr_topics=10, embedding_model='../../pretrain_models/all-mpnet-base-v2', representation_model=representation_model) # I used offline model here
vectorizer_model = CountVectorizer(ngram_range=(1, 1), stop_words="english")
topics, probs = topic_model.fit_transform(df['text'].to_list())
# Use `representation_model`
topic_model.update_topics(df['text'], vectorizer_model=vectorizer_model, representation_model=representation_model)
# show results
topic_model.get_document_info(df['text'])
Sorry for the delayed reply, i have tried your solution and it worked. Thanks a lot.
Have you searched existing issues? π
Desribe the bug
Adding representation model does not affect the output of 'Representation' column in topic_model.get_document_info(). To double confirm, I have purposefully created multiple representations using the same model
Reproduce the issue:
Expected results: Elements in 'Representation', 'Aspect1' and 'Aspect2' columns should be identical
Actual results: Elements in 'Aspect1' and 'Aspect2' columns are identical but 'Representation' column is different and seems to not have passed through the t5 model
Reproduction
No response
BERTopic Version
0.16.2