MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
5.83k stars 724 forks source link

Guided Topic Modeling #866

Open mjavedgohar opened 1 year ago

mjavedgohar commented 1 year ago

Hi @MaartenGr ,

I am tring to use the Guided Topic Modeling using the following code. Its working fine in Colab notebooks but getting error on my local machine. I am using BERTopic 0.12.0. Can you please help me for this??? Thanks

Code:

topic_model = BERTopic(language="english", verbose=True, seed_topic_list=seed_topic_list) topics, probs = topic_model.fit_transform(docs)

Error: topics, probs = topic_model.fit_transform(docs) File "...\Local\Programs\Python\Python38\lib\site-packages\bertopic_bertopic.py", line 344, in fit_transform y, embeddings = self._guided_topic_modeling(embeddings) File "...\Local\Programs\Python\Python38\lib\site-packages\bertopic_bertopic.py", line 2376, in _guided_topic_modeling embeddings[indices] = np.average([embeddings[indices], seed_topic_embeddings[seed_topic]], weights=[3, 1]) File "<__array_function__ internals>", line 5, in average File "..\Local\Programs\Python\Python38\lib\site-packages\numpy\lib\function_base.py", line 407, in average scl = wgt.sum(axis=axis, dtype=result_dtype) File "..\Local\Programs\Python\Python38\lib\site-packages\numpy\core_methods.py", line 47, in _sum return umr_sum(a, axis, dtype, out, keepdims, initial, where) TypeError: No loop matching the specified signature and casting was found for ufunc add

MaartenGr commented 1 year ago

When you are working across different environments, then there might be an issue with the packages that you have installed. I would advise starting from a completely fresh environment and re-installing everything there. From your code, it seems that Numpy might be the culprit here, so I would think that a fresh environment might solve the issue.

mjavedgohar commented 1 year ago

Thanks @MaartenGr, It solved the issue.

One more thing to discuss. In topics_over_time visualisation, there are only seven colours. How can I increase the range of colours? and can I change the colour of a particular topic after training the model? because most of the time when I tried to illustrate multiple topics in one Figure, some of them share the same colour.

Thanks

MaartenGr commented 1 year ago

At the moment it is not possible to increase the range of colors. If you select a maximum of seven topics beforehand, they should not have any matching colors. I might add it in a future version but I want to prevent opening up too many parameters as that will tighten the API too much and I might opt for Bokeh in the future instead of Plotly.

brennancruse commented 1 year ago

Linking these similar issues 221, 365, 278, 202.

Can also confirm that numba==0.53.1 and numpy==1.21.1 fixed this for me.

mjavedgohar commented 1 year ago

Thanks @brennancruse

mjavedgohar commented 1 year ago

Hi @MaartenGr , Can we use visualize_topics_over_time for the selected topics (e.g., 5, 19, 37 159 etc.) from a trained model?

Thanks

MaartenGr commented 1 year ago

Yes, there is the topics parameter in .visualize_topics_over_time that allows you to select specific topics to visualize. You can find more about the API here.

zhouzhongmi commented 1 year ago

The following code modify in _bertopic.py file fixed this for me.

def _guided_topic_modeling(self, embeddings: np.ndarray) -> Tuple[List[int], np.array]:
    ....

    for seed_topic in range(len(seed_topic_list)):
        indices = [index for index, topic in enumerate(y) if topic == seed_topic]
        # reshape the `seed_topic_embeddings` to the same shape with the `embeddings[indices]`
        seed_topic_embeddings_reshape = np.repeat(seed_topic_embeddings[seed_topic].reshape(1, -1), embeddings[indices].shape[0], axis=0)
        embeddings[indices] = np.average([embeddings[indices], seed_topic_embeddings_reshape], weights=[3, 1], axis=0)