Closed stergion closed 1 year ago
@stergion it actually is supposed to use any callable that the user supplies; the docstring says:
distance_metric : str or callable, default='euclidean' The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by sklearn's metrics.pairwise.pairwise_distances. If X is the distance array itself, use metric="precomputed".
So I think we need to have the DISTANCE_METRICS
check only happen if the input is a string. I'll open a quick PR for a fix shortly.
@stergion PR open if you'd like to take a look.
@stergion and it should be able to use any sklearn distance metric; the DISTANCE_METRICS
constant is out of date with the new version of scikit-learn; luckily sklearn now provides a sklearn.metrics.DistanceMetric.get_metric
function in v1.0 that will make it easier for us to check if the distance metric is valid.
To answer why we check if the metric is valid or not - we're trying to prevent the user from running a potentially lengthy fit process before the exception is raised by scikit-learn. But you're right, we need a generic way to allow scikit-learn to change without breaking Yellowbrick, and I think this new metrics functionality is the right answer.
@bbengfort thank you for your time, you answered all my questions and solved the problem.
@stergion thank you so much for using Yellowbrick and for all your contributions to the library!
Describe the solution you'd like Is there a reason
KElbowVisualizer
can use only the distance metrics inDISTANCE_METRICS
?The
distortion_score
relies onsklearn.metrics.pairwise.pairwise_distances()
, that supports distance metrics not included inDISTANCE_METRICS
. For the silhouette scoreKElbowVisualizer
usessklearn.metrics.cluster._unsupervised.silhouette_score()
, that also supports metrics not included inDISTANCE_METRICS
.I don't see a reason why there is this limitation.