Trying to reproduce the TC score for Trump dataset

MaartenGr / BERTopic_evaluation

Code and experiments for *BERTopic: Neural topic modeling with a class-based TF-IDF procedure*

MIT License

65 stars 32 forks source link

Trying to reproduce the TC score for Trump dataset #8

Closed chunfortam closed 1 year ago

chunfortam commented 1 year ago

Hi Maarten,

I am trying to reproduce the TC score of 0.066 for the Trump dataset with MPNET SBERT models, but I have been getting various results from -0.01x to 0.03 after averaging the 15 runs. I understand there is randomness introduced by UMAP, but I'd like to know if there's more reason for it. I followed the Python notebook and used the same dataset and wondering what's your thought on this.

Regards, Chun

MaartenGr commented 1 year ago

Did you make sure to use the versions as specified in the notebook? BERTopic, and its dependencies, have gone through several changes over the years which would explain some of the differences.

chunfortam commented 1 year ago

I think that was it, thanks!