PR in response to Stackoverflow question: https://stackoverflow.com/questions/69608173/yellowbrick-is-it-possible-to-pass-in-different-pairwise-distance-metrics-for-s

Summary

Sklearn defines a large number of pairwise distance metrics for something like silhouette score: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html

For e.g. it can be initiated with any of these distance metrics: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’]

However, KElbowVisualizer can pass in silhouette as the metric as follows:

KElbowVisualizer(KMeans(), k=(4, 12), metric='silhouette')

And it uses the silhouette score default distance metric, 'euclidean'. I wanted to make it possible to run KElbowVisualizer using a different distance metric than the default

Changes

I added the ability to specify pairwise distance metrics for out scoring functions

Sample Code and Plot

from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer

model = KMeans(random_state=0)
visualizer = KElbowVisualizer(KMeans(random_state=0), k=5, metric="distortion", 
                              distance_metric='manhattan', timings=False, 
                              locate_elbow=False)
visualizer.fit(X)
visualizer.finalize()

If you are adding or modifying a visualizer, PLEASE include a sample plot here along with the code you used to generate it.

TODOs and questions

Still to do:

Questions for the @DistrictDataLabs/team-oz-maintainers:

CHECKLIST

[x] Is the commit message formatted correctly?
[ ] Have you noted the new functionality/bugfix in the release notes of the next release?

[x] Included a sample plot to visually illustrate your changes?
[x] Do all of your functions and methods have docstrings?
[x] Have you added/updated unit tests where appropriate?
[x] Have you updated the baseline images if necessary?
[x] Have you run the unit tests using pytest?
[x] Is your code style correct (are you using PEP8, pyflakes)?
[ ] Have you documented your new feature/functionality in the docs?

[ ] Have you built the docs using make html?

Codecov Report

Merging #1238 (0bfea0f) into develop (092c0ca) will increase coverage by 0.01%. The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #1238      +/-   ##
===========================================
+ Coverage    90.48%   90.49%   +0.01%     
===========================================
  Files           92       92              
  Lines         5200     5206       +6     
===========================================
+ Hits          4705     4711       +6     
  Misses         495      495

Impacted Files	Coverage Δ
yellowbrick/cluster/elbow.py	`97.84% <100.00%> (+0.09%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 092c0ca...0bfea0f. Read the comment docs.

DistrictDataLabs / yellowbrick

Add pairwise distance metrics to scoring metrics in KElbowVisualizer #1238