DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.3k stars 559 forks source link

Fix KElbowVisualizer ValueError when null cluster is encountered #1186

Closed busFred closed 3 years ago

busFred commented 3 years ago

This PR fixes #1185 which ValueError is thrown when not all cluster center has at least one input data point assigned to it.

I have made the following changes:

  1. Instead of reassigning labels = le.fit_transform(labels), modification were made to just call le.fit(labels) without reassigning input argument labels.

Sample Code and Plot

import numpy as np
from yellowbrick.cluster import distortion_score
X = np.array([[1,2],[3,4],[5,6]])
labels = np.array([1,3,3])
print(distortion_score(X, labels))
labels = np.array([0,1,1])
print(distortion_score(X, labels))

Console Output

4.000000000000001
4.000000000000001

TODOs and questions

Still to do: None

Questions for the @DistrictDataLabs/team-oz-maintainers:

CHECKLIST

bbengfort commented 3 years ago

@busFred thank you for contributing to Yellowbrick! I'm running some other tests right now, but will review shortly.

codecov-commenter commented 3 years ago

Codecov Report

Merging #1186 (ce177c5) into develop (07ef358) will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #1186   +/-   ##
========================================
  Coverage    90.44%   90.44%           
========================================
  Files           90       90           
  Lines         5076     5076           
========================================
  Hits          4591     4591           
  Misses         485      485           
Impacted Files Coverage Δ
yellowbrick/cluster/elbow.py 98.49% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 07ef358...ce177c5. Read the comment docs.