shyzzz521 commented 10 months ago

Hi, I encountered the following error while performing semi-supervised training:

embeddings[indices] = np.average([embeddings[indices], seed_topic_embeddings[seed_topic]], weights=[3, 1])

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

The clustering corpus is formatted as a one-dimensional list with a length of 50779, and the vector dimensions are (50779, 1024). Importing the seed_topic_list as either a one-dimensional or two-dimensional list will result in this error. Here is a snippet of the code：

reduction_model = BaseDimensionalityReduction() cluster_model = KMeans(n_clusters=num_topics) topic_model = BERTopic(nr_topics=num_topics, top_n_words=10, seed_topic_list=seed_topic_list, embedding_model=sentence_model, umap_model=reduction_model,

min_topic_size=50,

                   calculate_probabilities=False,
                   hdbscan_model=cluster_model,
                   # vectorizer_model=vectorizer,
                   verbose=True)

topics, probs = topic_model.fit_transform(documents=recall, embeddings=embeddings)

I look forward to your reply.

MaartenGr commented 10 months ago

Have you checked for this error in some of the open and closed issues? I have seen this issue before and I think there are some worthwhile tips you can find there. You can find the relevant open issues here and closed issues here.

Please try those first and see whether they work. If not, please let me know what you have tried and how and we'll see if we can find a solution.

shyzzz521 commented 10 months ago

Hello, I followed your suggestion and resolved the issue by downgrading the numba version to 0.56.4. I referred to the approach mentioned here: https://github.com/MaartenGr/BERTopic/issues/1421

Thank you very much.

MaartenGr commented 10 months ago

Glad to hear that you managed to resolve the issue!

MaartenGr / BERTopic

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part. #1697

min_topic_size=50,