Open rebeccabilbro opened 7 years ago
something to the tune of:
import numpy as np
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram
def plot_dendrogram(self, **kwargs):
"""
Compute the distances between each pair of children and
a position for each child node. Then create a linkage
matrix, and plot the dendrogram.
"""
distance = np.arange(self.model.children_.shape[0])
position = np.arange(2, self.model.children_.shape[0]+2)
linkage_matrix = np.column_stack([
self.model.children_, distance, position]
).astype(float)
fig, ax = plt.subplots(figsize=(15, 7))
ax = dendrogram(linkage_matrix, orientation='left', **kwargs)
plt.tick_params(axis='x', bottom='off', top='off', labelbottom='off')
plt.tight_layout()
plt.show()
I have code for dendrogram simplification and plotting the resulting pruned/condensed dendrogram as part of my clustering project (http://github.com/scikit-learn-contrib/hdbscan). The condense_tree
tree routine in hdbscan/_hdbscan_tree.pyx
handles tree simplification and there is code in hdbscan/plots.py
that does plotting. Feel free to steal whatever looks useful.
@lmcinnes - this is excellent, thank you!
Another option is the treemap:
http://scipy-cookbook.readthedocs.io/items/Matplotlib_TreeMap.html
A dendrogram illustrates how each cluster is composed by drawing a U-shaped link between a non-singleton cluster and its children. The top of the U-link indicates a cluster merge. The two legs of the U-link indicate which clusters were merged. The length of the two legs of the U-link represents the distance between the child clusters. It is also the cophenetic distance between original observations in the two children clusters.
See also: