Open tmtsmrsl opened 2 months ago
Thank you for reaching out. This is expected behavior because when you save a model using pytorch
the underlying dimensionality reduction and clustering models are removed from the model. To then still have inference, a different technique is used to assign documents to topics (through cosine similarity between document and topic embeddings).
Do note that something similar might even happen when you use pickle
because HDBSCAN does an approximation during inference and is likely to differ from its results during training.
Have you searched existing issues? 🔎
Desribe the bug
When I save a model with pytorch serialization, then use the model to transform the training data, the new topic assignment is different from the "old" topic assignment.
Reproduction
BERTopic Version
0.16.3