DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.29k stars 559 forks source link

Generic handling of sklearn distance metric #1300

Closed bbengfort closed 1 year ago

bbengfort commented 1 year ago

This PR fixes #1296 which reported that Yellowbrick was being too restrictive of what distance metrics were allowed to be used with the KElbow visualizer and did not generalize to all scikit-learn distance metrics.

I have made the following changes:

  1. Removed the DISTANCE_METRICS string constants
  2. Ensured that callables were eligible to be distance metrics
  3. Used sklearn.metrics.DistanceMetric.get_metric to validate if the metric was good or not.

There was already a test in place for testing that a YellowbrickValueError is raised if a bad metric is passed into the visualizer, and this test is still passing.

Sample Code and Plot

Previously we were unable to run the following code, it should now be possible:

viz = KElbowVisualizer(distance_metric="chebyshev") 
viz.fit(X_train, y_train)
viz.show() 

CHECKLIST

codecov[bot] commented 1 year ago

Codecov Report

Merging #1300 (c9bf30a) into develop (5f12bc3) will increase coverage by 0.00%. The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           develop    #1300   +/-   ##
========================================
  Coverage    90.88%   90.89%           
========================================
  Files           93       93           
  Lines         5301     5303    +2     
========================================
+ Hits          4818     4820    +2     
  Misses         483      483           
Impacted Files Coverage Δ
yellowbrick/cluster/elbow.py 97.84% <100.00%> (+0.03%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more