gagolews / genieclust

Genie: Fast and Robust Hierarchical Clustering with Noise Point Detection - in Python and R
https://genieclust.gagolewski.com
Other
58 stars 10 forks source link

Dendrograms: Correct for departures from ultrametricity #38

Closed gagolews closed 4 years ago

gagolews commented 4 years ago

Also make sure dentrograms are plotted correctly in the presence of noise points

Also add examples in the tutorials

gagolews commented 4 years ago

Apply cummax on all elements up to the last positive one

for the remaining ones, apply cummax(abs(d))

gagolews commented 4 years ago

this is not documented but cut_tree needs this

gagolews commented 4 years ago
# create the linkage matrix, see scipy.cluster.hierarchy.linkage
Z = np.column_stack((g.children_, g.distances_, g.counts_))
# correct for possible departures from ultrametricity:
Z[:,2] = genieclust.tools.cummin(Z[::-1,2])[::-1]
import scipy.cluster.hierarchy
scipy.cluster.hierarchy.dendrogram(Z)
plt.show()
gagolews commented 4 years ago

cummin on reversed distances