Open Neehier opened 9 months ago
Could you share your full code for training, saving, and loading the model? Also, are you using the latest release (v0.16) or perhaps from the latest commit on the main branch itself?
I am indeed using v0.16. The model I am loading is originally a merged model.
umap_model = UMAP(n_components=15, n_neighbors=5, min_dist=0.0)
hdbscan_model = HDBSCAN(min_cluster_size=3, prediction_data=True)
representation_model = KeyBERTInspired()
base_model = BERTopic(umap_model=umap_model, hdbscan_model=hdbscan_model, representation_model=representation_model, calculate_probabilities=True, language='multilingual', verbose=True)
base_model.fit(docs1)
base_second_model = BERTopic(umap_model=umap_model, hdbscan_model=hdbscan_model, representation_model=representation_model, calculate_probabilities=True, language='multilingual', verbose=True)
base_second_model.fit(docs2)
merged_model = BERTopic.merge_models([base_model, base_second_model])
merged_model.save(path, serialization='pytorch')
I'm not so sure if this is relevant to add, but I am using gpu-accelerated HDBSCAN and UMAP from cuml
.
Hmmm, not sure what is happening here. There might be some strict checking done in .load
even though you are passing an embedding model. Upon further inspection, it might be related to this:
Perhaps that type checking needs to be removed. Could you test whether that works?
Unfortunately the warning persists even after removing the typecheck. I'll do some more checks and investigation after my exams.
Thanks for checking! I'll make sure to leave this open for your update.
Hello,
I have the same issue and tried to look into it but did not find a solution.
I'm using BERTopic version 0.16.0, sentence_transformers version 2.5.1. What I'm trying to do is to load a model from a directory (serialised as safetensors) and it seems that the embedding model does not get included as a parameter in the block at line 3051: https://github.com/MaartenGr/BERTopic/blob/8985f26d4ee89b4c512ff9da22a61371c20605b8/bertopic/_bertopic.py#L3138C1-L3139C118
And for this reason the try statement can't be executed and it selects the BaseEmbedder(): https://github.com/MaartenGr/BERTopic/blob/8985f26d4ee89b4c512ff9da22a61371c20605b8/bertopic/_bertopic.py#L4463C1-L4464C89
But this was just a quick check, I also did not find any really working solution, but it might help in finding a cause for the problem
@balcse I believe there are a couple of fixes for this in the main branch of BERTopic. I would advise installing BERTopic from the main branch to potentially fix the issue. Do note that the embedding model is only saved if you use save_embedding_model="some_string"
when saving the model. If not, then you can use the embedding_model
parameter in .load
.
Thanks for the quick reply, it is the main branch I'm using, I linked the wrong version in my comment. But saving the embedding model seems to work fine
Just wondering if this issue was solved?
Starting on a fresh environment run, loading in a
BERTopic
model usingload
results in a false warning mentioning a missing explicit definition ofembedding_model
.Despite the warning, the model seems to be loaded in with no issue and the embedding model works as expected.