Guided Modeling: Problem with seed_topic_list #1991

Open HeinzJS opened 1 month ago

HeinzJS commented 1 month ago


I've been having problems with performing a guided topic approach. The error I have been receiving is as such:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

The part of my code is as follows:

from sklearn.feature_extraction.text import CountVectorizer
from bertopic.representation import KeyBERTInspired, PartOfSpeech, MaximalMarginalRelevance

# Guided Model
seed_topic_list = [['hacking', 'hackers', 'hacked', 'lost', 'account'], 
                   ['data', 'leak', 'permissions', 'unauthorised', 'privacy'], 
                   ['bugs', 'crash', 'ddos', 'server', 'virus'], 
                   ['username', 'password', 'name', 'credit', 'email'], 
                   ['oculus', 'htc', 'windows', 'mac', 'meta']]

g_main_representation_model = KeyBERTInspired()
g_aspect_representation_model1 = PartOfSpeech("en_core_web_sm")
g_aspect_representation_model2 = [KeyBERTInspired(top_n_words=30), MaximalMarginalRelevance(diversity=.5)]

g_representation_model = {
   "Main": g_main_representation_model,
   "Aspect1":  g_aspect_representation_model1,
   "Aspect2":  g_aspect_representation_model2 

g_vectorizer_model = CountVectorizer(min_df=5, stop_words = 'english')
g_topic_mdl_rec = BERTopic(nr_topics = 'auto', vectorizer_model = g_vectorizer_model,
                      representation_model = g_representation_model, seed_topic_list=seed_topic_list)
g_topics_rec, g_ini_probs_rec = g_topic_mdl_rec.fit_transform(rec_room_reviews)

The solutions I have tried:

I can't find any other resources online about this so thought I would open a new one.

Here are a list of my current packages

MaartenGr commented 1 month ago

I'm not sure but I believe you have to set the numpy version even lower which is by no means an ideal solution. Instead, it might be worthwhile to use zero-shot BERTopic instead.

HeinzJS commented 1 month ago

Thank you for the reply.

I have previously tried setting both numpy to 1.23.5 and numba to 0.56.0, but it gave me an error:

numba 0.56.0 requires numpy<1.23,>=1.18, but you have numpy 1.23.5 which is incompatible.

As for numba 0.56.4 it still produced the same error as the original post.

I'll have a look at zero-shot approach. Thank you once again!