MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.08k stars 757 forks source link

keyBERTInspired #998

Closed kimkyulim closed 1 year ago

kimkyulim commented 1 year ago

Hi @MaartenGr I'm so glad that I can find training with KEYBERT.

I installed keybert to use keyBERTInspired, but I got an error called ModuleNotFoundError: No module named 'bertopic.representation'.

My current version of Bertopic is 0.13.0. The keybert version is 0.7.0.

Do I need to change the BERTopic version to use keyBERT Inspired? Can you tell me the solution?

MaartenGr commented 1 year ago

The representation models are not yet released so they will not work in BERTopic v0.13. You can install the upcoming version of BERTopic from the PR and follow along with the guides there. I expect to release v0.14 quite soon.

kimkyulim commented 1 year ago

Thank you for your reply. I will wait for v0.14 to come out soon!

MaartenGr commented 1 year ago

v0.14 has been released and you can find the documentation for running KeyBERTInspired here.

kimkyulim commented 1 year ago

Hi @MaartenGr Congratulations on publishing KeyBERT Inspired. I'm trying to use KeyBERT Inspired on the model I saved, and it's making an error. Is it only possible when you're learning something new?

And I set top_n_words=30 in the BERTopic model, but only default 10 words keep coming out.

Thank u for your answer.

MaartenGr commented 1 year ago

@kimkyulim Could you share your entire code for getting this error, including the full error message? If you want to use KeyBERTInspired, then the model that you saved should be v0.14 since it was not supported in earlier versions. In KeyBERTInspired, there is also a top_n_words parameter that you will have to set as each representation model can individually set the number of words to be outputted and used by a different representation model.

kimkyulim commented 1 year ago

HI @MaartenGr I saved the model in version 13 and ran the model in version 14. Does that mean that I have to make a model in version 14 and save it and use it in version 14?

It is my code: `# Create your representation model representation_model = KeyBERTInspired()

topic_model = BERTopic(top_n_words=30, representation_model=representation_model) topic_model= topic_model.load("save_model") topics, probs = topic_model.fit_transform(docs)`

error:AttributeError: 'BERTopic' object has no attribute 'representation_model'

Thank u for yor help! Have a nice day!

hbeelee commented 1 year ago

Hi @MaartenGr,

I'm having a similar issue - after loading a model saved in version 0.13.0 in a 0.14.0 environment, and merging the topics, I get attribute error. Maybe I should downgrade to 0.13.0 to work with this loaded model?

!pip install bertopic==0.14.0
from bertopic import BERTopic

model = BERTopic.load("/content/drive/MyDrive/Colab Notebooks/n_27")
topics_to_merge = [[2, 27, 15],  [4, 5]]
model.merge_topics(keywords, topics_to_merge)
AttributeError: 'BERTopic' object has no attribute 'representation_model'
MaartenGr commented 1 year ago

@kimkyulim @hbeelee That is correct. The representation_model parameter was not implemented in v0.14 before and will give you errors if you load in a different version. As with many models, it is important that when you save a model, you perform version control as loading and using the model across versions (including those of dependencies) will typically not work.

kimkyulim commented 1 year ago

Thank u for your replyging. @MaartenGr

I was trying to make a model and save it again in version 14, but an error occurred.

my error: ImportError: cannot import name 'UMAP' from 'umap' (C:\ProgramData\Anaconda3\lib\site-packages\umap__init__.py)

my code:

from bertopic.representation import KeyBERTInspired from bertopic import BERTopic

representation_model = KeyBERTInspired()

Use the representation model in BERTopic on top of the default pipeline

topic_model = BERTopic(top_n_words=30, representation_model=representation_model) topics, probs = topic_model.fit_transform(docs)

MaartenGr commented 1 year ago

@kimkyulim It might be worthwhile to remove umap from your installation through pip uninstall umap and then re-install it with pip install --upgrade umap-learn. You can also try to uninstall with conda. This issue typically happens when you installed the wrong umap package. If that does not work, I would advise starting from a completely empty environment and installing BERTopic there.

kimkyulim commented 1 year ago

@MaartenGr Thnak u for your advice! It works me.

But, I set the number of top keywords to 30, but the results keep coming up to 10. My code hasn't changed since before.

my code: representation_model = KeyBERTInspired()

topic_model = BERTopic(top_n_words=30, representation_model=representation_model) topics, probs = topic_model.fit_transform(docs)

MaartenGr commented 1 year ago

@kimkyulim That is because the KeyBERTInspired model also has a top_n_words parameter, which is set to 10 by default. Setting it higher would solve your issue.