EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.72k stars 1.57k forks source link

Extending TPOT to unsupervised clustering #195

Open matalab opened 8 years ago

matalab commented 8 years ago

Hi, I'm excited with your TPOT tool and how it infers hyperparameters for binary classifiers. I was wondering whether you have any plans to extend TPOT to unsupervised machine learning, i.e. clustering?

Context of the issue

Setting hyperparameters for various clustering algorithms in scikit-learn can be tricky similarly to unsupervised learning algorithms. It would be great if clustering hyperparameters could be automatically infered by TPOT in same way as it is performed for classifying algorithms. I presume silhouette coefficient would be an adequate scoring method, because it is a measure of compactness and separation of clusters. It is already present in scikit-learn (see here: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html)

rhiever commented 8 years ago

I like this idea. I'd like to explore it after we have regression integrated into TPOT.

We should explore additional metrics for scoring unsupervised results as well.

matalab commented 8 years ago

On the page http://scikit-learn.org/stable/modules/clustering.html, in chapter 2.3.9. Clustering performance evaluation, there are several clustering performance measures mentioned. Besides silhouette coefficient (2.3.9.4. Silhouette Coefficient), there are also: 2.3.9.1. Adjusted Rand index 2.3.9.2. Mutual Information based scores 2.3.9.3. Homogeneity, completeness and V-measure

nlyf commented 7 years ago

I'd be glad to have this feature. Are you really planning on sorting it out?

rhiever commented 7 years ago

We plan to add it eventually, but we have many more high-priority issues to resolve before we touch this one. We are happy for you to start working on this issue and send over a PR if you're interested. Please let us know.

HamedMP commented 6 years ago

Any updates on this?

mfeurer commented 6 years ago

As one of the maintainers of Auto-sklearn I'm also asked this several times. However, there are two papers arguing against such a feature:

I'd be really interested to learn how you overcome these problems.

Bec-k commented 1 year ago

Any update on this?