Originally posted by **jdweaver14** April 17, 2023
Hi everyone,
I apologize for not posting images here, but I am not at liberty to share the data I am working with.
I am new to BERTopic, but I was under the impression that the ClassTfidfTransformer and CountVectorizer steps were only used after embeddings (that did not use preprocessing steps) and clustering to improve topic representation. However, when I run the same dataset with fixed random_state through the model with and without either or both of these steps introduced, I get both a different number of clusters and difference counts per cluster. This should not be happening if these steps are only used on the already calculated cluster information. Please help me understand what I am missing!
Discussed in https://github.com/MaartenGr/BERTopic/discussions/1192