Open matalab opened 8 years ago
I like this idea. I'd like to explore it after we have regression integrated into TPOT.
We should explore additional metrics for scoring unsupervised results as well.
On the page http://scikit-learn.org/stable/modules/clustering.html, in chapter 2.3.9. Clustering performance evaluation, there are several clustering performance measures mentioned. Besides silhouette coefficient (2.3.9.4. Silhouette Coefficient), there are also: 2.3.9.1. Adjusted Rand index 2.3.9.2. Mutual Information based scores 2.3.9.3. Homogeneity, completeness and V-measure
I'd be glad to have this feature. Are you really planning on sorting it out?
We plan to add it eventually, but we have many more high-priority issues to resolve before we touch this one. We are happy for you to start working on this issue and send over a PR if you're interested. Please let us know.
Any updates on this?
As one of the maintainers of Auto-sklearn I'm also asked this several times. However, there are two papers arguing against such a feature:
I'd be really interested to learn how you overcome these problems.
Any update on this?
Hi, I'm excited with your TPOT tool and how it infers hyperparameters for binary classifiers. I was wondering whether you have any plans to extend TPOT to unsupervised machine learning, i.e. clustering?
Context of the issue
Setting hyperparameters for various clustering algorithms in scikit-learn can be tricky similarly to unsupervised learning algorithms. It would be great if clustering hyperparameters could be automatically infered by TPOT in same way as it is performed for classifying algorithms. I presume silhouette coefficient would be an adequate scoring method, because it is a measure of compactness and separation of clusters. It is already present in scikit-learn (see here: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html)