DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.3k stars 559 forks source link

Unable to use Silhouette Visualizer with Gaussian Mixture Model #1303

Open Thecave3 opened 1 year ago

Thecave3 commented 1 year ago

Describe the bug Silhouette score and its visualization can be calculated for Gaussian Mixture Model outputs, while this library currently does not support this.

To Reproduce I used the example code from here and I changed the model from Kmeans to GMM.

# Steps to reproduce the behavior (code snippet):
# Should include imports, dataset loading, and execution
from sklearn.mixture import GaussianMixture as GMM

from yellowbrick.cluster import SilhouetteVisualizer
from yellowbrick.datasets import load_nfl

# Load a clustering dataset
X, y = load_nfl()

# Specify the features to use for clustering
features = ['Rec', 'Yds', 'TD', 'Fmb', 'Ctch_Rate']
X = X.query('Tgt >= 20')[features]

# Instantiate the clustering model and visualizer
model = GMM(5, random_state=42)
visualizer = SilhouetteVisualizer(model, colors='yellowbrick')

visualizer.fit(X)        # Fit the data to the visualizer
visualizer.show()        # Finalize and render the figure

Dataset The dataset chosen does not affect the outcome.

Expected behavior I expect the fitting of the data and the visualization of the scores over the figure.

Traceback

Traceback (most recent call last):
  File "sil_testet.py", line 15, in <module>
    visualizer = SilhouetteVisualizer(model, colors='yellowbrick')
  File "/usr/local/lib/python3.8/dist-packages/yellowbrick/cluster/silhouette.py", line 118, in __init__
    super(SilhouetteVisualizer, self).__init__(estimator, ax=ax, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/yellowbrick/cluster/base.py", line 45, in __init__
    raise YellowbrickTypeError(
yellowbrick.exceptions.YellowbrickTypeError: The supplied model is not a clustering estimator; try a classifier or regression score visualizer instead!

Desktop (please complete the following information):

Additional context

I believe SilhouetteVisualizer should support GMM due to the possibility of using it as a clustering methodology (e.g., Gaussian Mixture Models Clustering Algorithm Explained).

Thecave3 commented 1 year ago

I see that this may be also solved by the merging of PR #1294.

bbengfort commented 1 year ago

@Thecave3 I was hoping that your issue would be solved by #1294. @lwgray any status on that PR?

Thecave3 commented 1 year ago

@bbengfort I believe that it can solve the issue, however I am not sure about the automatic tests that are preventing the PR to be merged.

bbengfort commented 1 year ago

@bbengfort I believe that it can solve the issue, however I am not sure about the automatic tests that are preventing the PR to be merged.

@lwgray any thoughts?

lwgray commented 1 year ago

I will create the test this weekend.

Cheers Larry

On Fri, Jun 16, 2023 at 6:32 AM Benjamin Bengfort @.***> wrote:

@bbengfort https://github.com/bbengfort I believe that it can solve the issue, however I am not sure about the automatic tests that are preventing the PR to be merged.

@lwgray https://github.com/lwgray any thoughts?

— Reply to this email directly, view it on GitHub https://github.com/DistrictDataLabs/yellowbrick/issues/1303#issuecomment-1594604474, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHUFNJNDAR3IPTIZPNG4J3XLRG4DANCNFSM6AAAAAAXAISBUM . You are receiving this because you were mentioned.Message ID: @.***>

lwgray commented 1 year ago
  1. I found that the GMM estimator type isn't a clusterer but a DensityEstimator ( which I have never see before). I know how to fix this
  2. GMM doesn't have a nclusters attribute, which is expected by Yellowbrick for clustering estimators. I have to dig deeper into this
lwgray commented 1 year ago

@bbengfort Another update... #1294 will solve part of this issue. and #1304 fixes the "not a clustering estimator" error