ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part. encountered during model fitting with provided seed words for guided topic modelling
Description
A ValueError was encountered when attempting to fit a topic model using BERTopic with the following configuration:
The error occurs when calling the fit_transform method on a BERTopic instance with a set of documents and their embeddings.
Probably the internal call to np.average is not behaving as expected?
When attempting to use np.average to compute a weighted average of document embeddings and seed topic embeddings, the ValueError is encountered due to passing a list of arrays with different shapes to np.average, leading to an inhomogeneous shape.
This should update the document embeddings with a weighted influence from corresponding seed topic embeddings.
Steps to Reproduce
Installed numpy version: 1.25.0
Initialize BERTopic model with guided modelling approach.
Prepare a dataset of documents and their corresponding embeddings.
Call the fit_transform method on the BERTopic model.
Error Traceback
ValueError Traceback (most recent call last)
Cell In[7], line 104
102 # Topic Model Fitting
103 print("Topic model fitting..")
--> 104 topic, probs = topic_model.fit_transform(doc, embedding)
106 # Save Model State Checkpoint
107 print("Saving model embeddings checkpoint..")
File c:\Users\georg\anaconda3\Lib\site-packages\bertopic\_bertopic.py:399, in BERTopic.fit_transform(self, documents, embeddings, images, y)
397 # Guided Topic Modeling
398 if self.seed_topic_list is not None and self.embedding_model is not None:
--> 399 y, embeddings = self._guided_topic_modeling(embeddings)
401 # Zero-shot Topic Modeling
402 if self._is_zeroshot():
File c:\Users\georg\anaconda3\Lib\site-packages\bertopic\_bertopic.py:3617, in BERTopic._guided_topic_modeling(self, embeddings)
3615 for seed_topic in range(len(seed_topic_list)):
3616 indices = [index for index, topic in enumerate(y) if topic == seed_topic]
-> 3617 embeddings[indices] = np.average([embeddings[indices], seed_topic_embeddings[seed_topic]], weights=[3, 1])
3618 logger.info("Guided - Completed \u2713")
File c:\Users\georg\anaconda3\Lib\site-packages\numpy\lib\function_base.py:511, in average(a, axis, weights, returned, keepdims)
398 @array_function_dispatch(_average_dispatcher)
399 def average(a, axis=None, weights=None, returned=False, *,
400 keepdims=np._NoValue):
401 """
402 Compute the weighted average along the specified axis.
403
(...)
509 [4.5]])
510 """
--> 511 a = np.asanyarray(a)
513 if keepdims is np._NoValue:
514 # Don't pass on the keepdims argument if one wasn't given.
515 keepdims_kw = {}
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
Thanks for sharing the extensive description of your issue. I believe this is a known issue for which the fix seems to be to lower the numpy version I believe. Could you check the link I shared for specifics?
Issue:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
encountered during model fitting with provided seed words for guided topic modellingDescription
A
ValueError
was encountered when attempting to fit a topic model using BERTopic with the following configuration:The error occurs when calling the
fit_transform
method on a BERTopic instance with a set of documents and their embeddings.Probably the internal call to
np.average
is not behaving as expected?When attempting to use np.average to compute a weighted average of document embeddings and seed topic embeddings, the ValueError is encountered due to passing a list of arrays with different shapes to np.average, leading to an inhomogeneous shape. This should update the document embeddings with a weighted influence from corresponding seed topic embeddings.
Steps to Reproduce
fit_transform
method on the BERTopic model.Error Traceback