MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
5.79k stars 721 forks source link

list index out of range #1925

Open rap8 opened 2 months ago

rap8 commented 2 months ago

This error occurred when I was processing some word documents. Sometimes this error occurs and sometimes it does not. I looked at the error message. In the _bertropic.py file, line 4024, this error will appear when the topics are all -1. , because the unique_topics parameter is empty, is there any way to avoid this error.? image

MaartenGr commented 2 months ago

Could you also share your full code along with the version of BERTopic you are using? How many documents are you passing to BERTopic?

If there are indeed only -1 topics, then the way to avoid that is to increase the number of topics you are generating. Most likely, and it is difficult to say without knowing your full code, you will need to increase the value of min_topic_size or its equivalent in HDBSCAN, min_cluster_size.

rap8 commented 2 months ago

Could you also share your full code along with the version of BERTopic you are using? How many documents are you passing to BERTopic?您能否分享您的完整代码以及您正在使用的 BERTopic 版本?您向 BERTopic 传递了多少文档?

If there are indeed only -1 topics, then the way to avoid that is to increase the number of topics you are generating. Most likely, and it is difficult to say without knowing your full code, you will need to increase the value of min_topic_size or its equivalent in HDBSCAN, min_cluster_size.如果确实只有 -1 个主题,那么避免这种情况的方法是增加您生成的主题数量。最有可能的是,在不知道完整代码的情况下很难说,您将需要在 HDBSCAN 中增加 的 min_topic_size 值或其等效值。 min_cluster_size

Hi, I have a question. The get_topic_info function prints only 3 representative documents for each topic. So how do I know which topic different documents belong to? I can't seem to find it in your introduction document.

MaartenGr commented 2 months ago

So how do I know which topic different documents belong to? I

The output of .fit_transform gives you the topic variable which contains the assignment of a topic to a document. You can also find this assignment in the topic_model.topics_ attribute.

rap8 commented 2 months ago

So how do I know which topic different documents belong to? I那么我如何知道不同的文档属于哪个主题呢?我

The output of .fit_transform gives you the topic variable which contains the assignment of a topic to a document. You can also find this assignment in the topic_model.topics_ attribute.的 .fit_transform 输出为您提供一个 topic 变量,该变量包含将主题分配给文档。您还可以在 topic_model.topics_ 属性中找到此分配。

thanks. I see.