MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.15k stars 765 forks source link

Understanding of visualize_hierarchy() figure and "Distance" value from hierarchical_topics(docs) dataframe #1217

Closed ericniso closed 1 year ago

ericniso commented 1 year ago

Hi,

here is a dataframe subset containing a cluster "204" which has clusters "48" and "74" as children, returned from hierarchical_topics(docs) method:

dataset dendrogram distance value

As shown, the distance between "48" and "78" is 0.37515

Here is the corresponding dendrogram figure from visualize_hierarchy() method:

dendrogram figure

As you can see, the distance displayed is clearly not equal to 0.37515, but a little above 0.6

Is this the intended behaviour?

Why are the two distances not equal?

MaartenGr commented 1 year ago

This might be a result of a known issue that was fixed in the main branch. If you install BERTopic from the main branch, I believe this should be fixed.

MaartenGr commented 1 year ago

Closing this due to inactivity. Let me know if I need to re-open the issue!