MaartenGr / KeyBERT

Minimal keyword extraction with BERT
https://MaartenGr.github.io/KeyBERT/
MIT License
3.47k stars 344 forks source link

Document Clustering between KeyBERT and Sentence Transformer? #165

Open km5ar opened 1 year ago

km5ar commented 1 year ago

I'm wondering if anyone compared the differences using KeyBERT vs Sentence Transformers for document clustering?

MaartenGr commented 1 year ago

KeyBERT itself is already using SentenceTransformers for extracting the document and word embeddings. It might be interesting to compare how well the clustering would be on the keyword embeddings compared to the document embedding but unfortunately I have not tried it out yet.

km5ar commented 1 year ago

@MaartenGr Yeah! I read your official doc a few month ago, I remember there were a section which you suggesting about first use KeyBERT then clustering, but recently I tried to find that section again, but not able to locate it anymore.

MaartenGr commented 1 year ago

I actually do not remember using writing in the documentation as such a use case with respect to KeyBERT. It may have been PolyFuzz but KeyBERT is not generally used for clustering unless word embeddings are clustered again.