EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.66k stars 1.56k forks source link

Extended Multi-label Classification Support #196

Open ChristianSch opened 8 years ago

ChristianSch commented 8 years ago

Hey there,

As tpot seems to rely solely on scikit-learn for (meta-) estimators the lack of extended multi-label classification strategies is quite noticeable. The work on some of the strategies and algorithms is stalled for quite some time now in scikit-learn (https://github.com/scikit-learn/scikit-learn/pull/2461 (label powerset) and https://github.com/scikit-learn/scikit-learn/pull/3727 (classifier chains)). As such there is work being done on scikit-multilearn and it already brings at least some novel working algorithms and strategies.

What do you think of including scikit-multilearn (at least for the time being) to extend the support of multi-label classification?

(For clarification: multi-label classification is defined as finding a subset of predicted labels out of a total label set, i.e. Y_hat = {1,3,5}, meaning multiple "classes" (or labels in this context) are assigned to one sample.)

rhiever commented 8 years ago

Interesting idea! This is a feature that we should consider in the future after we add support for regression (#186) and unsupervised learning (#195).

ChristianSch commented 8 years ago

I'd be glad to help, just hit me up.