Closed registjl closed 1 month ago
Thanks for converting this into an issue! Sorry to be a bit more annoying but I'll need some more information. Which version of BERTopic do you have? Also, could you provide the full code? That includes both fitting the model as well as how you saved and loaded it again.
Lastly, could you provide the full error log?
Hi Maarten - thanks for your help. Let me know if you need any additional info.
- CODE TO FIT THE MODEL (I'm limited in what I can share):
from bertopic import BERTopic
from transformers import AutoModel, AutoTokenizer
embedding_model = AutoModel.from_pretrained("cardiffnlp/tweet-topic-21-multi")
iteration_sel = f"163"
n_neighbors = 10
n_components = 2
min_dist = 0.0
min_cluster_size = 15
min_samples = 1
umap_model = UMAP(n_neighbors=n_neighbors, n_components=n_components, min_dist=min_dist)
hdbscan_model = HDBSCAN(min_cluster_size=min_cluster_size, min_samples=min_samples, prediction_data=True)
topic_model = BERTopic(umap_model=umap_model, hdbscan_model=hdbscan_model,
calculate_probabilities=True, embedding_model=embedding_model)
topics, probabilities = topic_model.fit_transform(TRAIN_DATA['clean_text'])
model_name = "./BERTopic_trained_model_163"
topic_model.save(model_name)
- CODE WHICH LOADS AND EXECUTES THE SAVED MODEL
topic_model_name = f'./BERTopic_trained_model_163' topic_model = BERTopic.load(topic_model_name) topics, probabilities = topic_model.transform(TEST_DATA['clean_text'])
Traceback (most recent call last):
File "C:\Users....\venv\lib\site-packages\pandas\core\indexes\base.py", line 3805, in get_loc
return self._engine.get_loc(casted_key)
File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 2606, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 2630, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 88
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users....\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3577, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "
Hmmm, I'm not entirely sure what is happening here. Did you make sure that the environments in which you load and save the model are identical? When you use pickle to save a model, it is important that you use version control to exactly reproduce the training environment.
Thanks for getting back to me, Maarten. I'm using PyCharm for my development, and I created the script to build/save the model and the script to load and transform the model in the same "PyCharm Project", i.e., I didn't create a new environment (as far as I know).
Let you know!
Hi Maarten --
I THINK I FOUND THE PROBLEM: I CONVERTED THE test_data['text'] DATAFRAME TO A LIST and IT APPEARS TO WORK!
That's great! Glad to hear that it worked 😄
Discussed in https://github.com/MaartenGr/BERTopic/discussions/2160