gagolews / genieclust

Genie: Fast and Robust Hierarchical Clustering with Noise Point Detection - in Python and R
https://genieclust.gagolewski.com
Other
58 stars 11 forks source link

cache intermediate results to make the next call to fit() faster #30

Closed gagolews closed 4 years ago

gagolews commented 4 years ago

Computing mst_dist, mst_ind is the slowest part

Getting nn_dist, nn_ind (needed to compute d_core before mst_dist, mst_ind) is also slow (M>1)

compute_full_tree, postprocess, gini_threshold, n_clusters are only applied after mst_dist, mst_ind are computed

Changing X, M, affinity, cast_float32, exact require mst_dist, mst_ind be recomputed

However, decreasing M only does not invalidate nn_dist, nn_ind - at least d_core could be generated faster.