DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.3k stars 559 forks source link

Density based clustering validity measures #229

Open lmcinnes opened 7 years ago

lmcinnes commented 7 years ago

Clustering scores like silhouette work well for K-Means but make less sense for density based clustering techniques like DBSCAN which support arbitrary cluster shapes. It would be nice to include scores and visualisation for measures that support density based notions of clustering. These are a a little thin on the ground, but the Density Based Cluster Validity Index of Moulavi et al (http://www.dbs.ifi.lmu.de/~zimek/publications/SDM2014/DBCV.pdf) is one of the better ones.

bbengfort commented 7 years ago

Absolutely that would be awesome -- we had to add our own distortion score metric, would you be willing to write up some Python to compute the cluster validity index? Check out distortion_score for signature and input.

lmcinnes commented 7 years ago

I have some code for it here. It has some dependency on hdbscan, but in practice that amounts to the mst_linkage_core, which you can replace with any suitable minimum spanning tree code.