DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.27k stars 556 forks source link

DendrogramVisualizer for Agglomerative Clustering #204

Open rebeccabilbro opened 7 years ago

rebeccabilbro commented 7 years ago

A dendrogram illustrates how each cluster is composed by drawing a U-shaped link between a non-singleton cluster and its children. The top of the U-link indicates a cluster merge. The two legs of the U-link indicate which clusters were merged. The length of the two legs of the U-link represents the distance between the child clusters. It is also the cophenetic distance between original observations in the two children clusters.

See also:

rebeccabilbro commented 7 years ago

something to the tune of:

import numpy as np
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram

    def plot_dendrogram(self, **kwargs):
      """
      Compute the distances between each pair of children and
      a position for each child node. Then create a linkage
      matrix, and plot the dendrogram.
      """
        distance = np.arange(self.model.children_.shape[0])
        position = np.arange(2, self.model.children_.shape[0]+2)

        linkage_matrix = np.column_stack([
            self.model.children_, distance, position]
        ).astype(float)

        fig, ax = plt.subplots(figsize=(15, 7))

        ax = dendrogram(linkage_matrix, orientation='left', **kwargs)

        plt.tick_params(axis='x', bottom='off', top='off', labelbottom='off')
        plt.tight_layout()
        plt.show()
rebeccabilbro commented 7 years ago

SciPy dendrogram source

lmcinnes commented 7 years ago

I have code for dendrogram simplification and plotting the resulting pruned/condensed dendrogram as part of my clustering project (http://github.com/scikit-learn-contrib/hdbscan). The condense_tree tree routine in hdbscan/_hdbscan_tree.pyx handles tree simplification and there is code in hdbscan/plots.py that does plotting. Feel free to steal whatever looks useful.

rebeccabilbro commented 7 years ago

@lmcinnes - this is excellent, thank you!

bbengfort commented 6 years ago

Another option is the treemap:

http://scipy-cookbook.readthedocs.io/items/Matplotlib_TreeMap.html