import openai from bertopic.representation import KeyBERTInspired, MaximalMarginalRelevance, OpenAI, PartOfSpeech

KeyBERT

keybert_model = KeyBERTInspired()

GPT-3.5

prompt = """ I have a topic that contains the following documents: [DOCUMENTS] The topic is described by the following keywords: [KEYWORDS]

Based on the information above, extract a short but highly descriptive topic label of at most 5 words. Make sure it is in the following format: topic: """ client = openai.OpenAI(api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx") openai_model = OpenAI(client, model="gpt-3.5-turbo", exponential_backoff=True, chat=True, prompt=prompt)

All representation models

representation_model = { "KeyBERT": keybert_model, "OpenAI": openai_model, # Uncomment if you will use OpenAI }

from bertopic import BERTopic

topic_model = BERTopic(

Pipeline models

embedding_model=embedding_model,

umap_model=umap_model, representation_model=representation_model,

Hyperparameters

nr_topics= "auto",

min_topic_size=30,

verbose=True )

topics, probs = topic_model.fit_transform(subgroup_dfs['Pet Foods']['productMaterial'])

  2023-12-06 05:33:42,424 - BERTopic - Embedding - Transforming documents to embeddings.
  .gitattributes: 100%
  1.18k/1.18k [00:00<00:00, 58.7kB/s]
  1_Pooling/config.json: 100%
  190/190 [00:00<00:00, 12.6kB/s]
  README.md: 100%
  10.6k/10.6k [00:00<00:00, 651kB/s]
  config.json: 100%
  612/612 [00:00<00:00, 40.0kB/s]
  config_sentence_transformers.json: 100%
  116/116 [00:00<00:00, 8.19kB/s]
  data_config.json: 100%
  39.3k/39.3k [00:00<00:00, 605kB/s]
  pytorch_model.bin: 100%
  90.9M/90.9M [00:00<00:00, 171MB/s]
  sentence_bert_config.json: 100%
  53.0/53.0 [00:00<00:00, 2.44kB/s]
  special_tokens_map.json: 100%
  112/112 [00:00<00:00, 7.00kB/s]
  tokenizer.json: 100%
  466k/466k [00:00<00:00, 2.39MB/s]
  tokenizer_config.json: 100%
  350/350 [00:00<00:00, 27.1kB/s]
  train_script.py: 100%
  13.2k/13.2k [00:00<00:00, 828kB/s]
  vocab.txt: 100%
  232k/232k [00:00<00:00, 14.7MB/s]
  modules.json: 100%
  349/349 [00:00<00:00, 28.5kB/s]
  Batches: 100%
  90/90 [00:04<00:00, 61.69it/s]
  2023-12-06 05:33:56,511 - BERTopic - Embedding - Completed ✓
  2023-12-06 05:33:56,512 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
  2023-12-06 05:34:35,919 - BERTopic - Dimensionality - Completed ✓
  2023-12-06 05:34:35,921 - BERTopic - Cluster - Start clustering the reduced embeddings
  2023-12-06 05:34:36,010 - BERTopic - Cluster - Completed ✓
  2023-12-06 05:34:36,011 - BERTopic - Representation - Extracting topics from clusters using representation models.
    0%|          | 0/31 [00:00<?, ?it/s]

MaartenGr / BERTopic

OpenAI representation not working. It just never completes the running process, as shown in the below code. #1667