Closed Hveemos closed 1 month ago
Thank you for sharing this feature request! Note that it already does this if you set the representations to be the main ones. So doing representation_model=KeyBERT
should already use the KeyBERTInspired keywords. Admittedly, it does not use additional aspects that you can choose and play around. That would certainly be nice to use!
Ok, so I was just reading the Multiple Representations instruction and realized that I could set my Default representation by just giving the model the correct name, "Main", e.i.:
# KeyBERT
keybert_model = KeyBERTInspired()
# Part-of-Speech
pos_model = PartOfSpeech("sv_core_news_sm")
# MMR
mmr_model = MaximalMarginalRelevance(diversity=0.3)
# GPT4
openai_model = OpenAI(client, model="gpt-4o", exponential_backoff=True, chat=True, prompt=prompt)
# All representation models
representation_model = {
"Main": keybert_model,
"OpenAI": openai_model,
"MMR": mmr_model,
"POS": pos_model
}
topic_model = BERTopic(
# Pipeline models
embedding_model=embedding_model,
representation_model=representation_model,
vectorizer_model=vectorizer_model,
hdbscan_model=hdbscan_model,
umap_model=umap_model_5D,
# Hyperparameters
top_n_words=10,
verbose=True,
language="swedish",
n_gram_range=(1, 2),
)
So, thank you. That'll be all!
Feature request
Possibility to change the Default representation on which OpenAI bases it's response on.
Motivation
As of now, when I create topic representation by OpenAI it bases the prompt on a couple of representative documents and the Default Representation keywords. But in my case these are oftentimes unusefull. However, I found both the KeyBERTInspired and PartOfSpeech quite usefull. So I would like to base the prompt on either of those instead.
Your contribution
I think this feature should be easy to implement if you know your way around this library (which I don't). So, I'm afraid I won't be of much help...