Does BERTopic rely on *both* sentence_embeddings and word_embeddings

MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

MIT License

5.97k stars 747 forks source link

When exploring relationships between topics (2D visualisations, hierarchy) we need to represent each topic as a summary vector (cluster-level embedding).

The BERTopic source code stats

topic_embeddings_ (np.ndarray) : The embeddings for each topic. It is calculated by taking the weighted average of word embeddings in a topic based on their c-TF-IDF values.

This seems to imply BERTopic needs both a sentence-level word embedding model and a word-level embedding model.

Is this the case? Where is this specified in the source code please?

MaartenGr / BERTopic

Does BERTopic rely on *both* sentence_embeddings and word_embeddings #1403

Does BERTopic rely on both sentence_embeddings and word_embeddings #1403