MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
5.76k stars 716 forks source link

Extending ".visulize_document_datamap" with "label_over_points"-flag #2012

Open PYaDo opened 1 month ago

PYaDo commented 1 month ago

Hey @MaartenGr! Great work on this library. I recently came across an underlying functionality of datamapplot and noticed it is not yet implemented in the BERTopic package. I assume this relates to an updated DataMapPlot-version.

In detail: The DataMapPlot-lib allows me to assign the labels on top of the clusters instead of using "straight line pointers". This may be helpful to handle large visualizations with many clusters to ensure readability. I was not able to pass the "label_over_points" flag to the "visualize_document_datamap". The docs for the according DataMapPlot section can be found here: https://datamapplot.readthedocs.io/en/latest/label_over_points.html

My current workaround is to use the most recent version of DataMapPlot and pass in the UMAP-Embeddings, topic labels and desired DataMapPlot-flags directly. Therefore I avoid using the ".visualize_document_datamap"-Method on my topic_model.

import datamapplot fig, ax = datamapplot.create_plot(umap_embeddings, labels, label_over_points=True) plt.show()

An manual update of the underlying DataMapPlot package via pip however seems to solve the issue for me. You might wanna have a look into that. Thanks in advance!

MaartenGr commented 4 weeks ago

If I'm not mistaken, this would be solved by using the latest version of datamapplot and in BERTopic use the datamap_kwds parameter right?