Closed e-barrere closed 2 years ago
The main reason for this is modularity. Although HDBSCAN is the default model, other clustering algorithms can be used instead, such as k-Means. In order to support any clustering technique, it is necessary to make this step, somewhat, independent. There is also something to say for comparing the end-result, the topic representations and too a lesser extent the clusters. That, however, might just be semantics although it does follow the philosophy of modularity as presented in the package.
I get it now thank you for your answer !
Hello,
Thank you for this fantastic work, Bertopic is really useful. I was wondering why is the visualization of the hierarchy based off the results of the c_tf_idf ? Since the HDBSCAN results is already a hierarchical result, why recalculate a distance representation from the c_tf_idf rather than using the hdbscan result?
Thank you