EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.73k stars 1.57k forks source link

How to use TPOT in the text domain? #544

Closed ben0it8 closed 7 years ago

ben0it8 commented 7 years ago

Hello,

My question is if it's possible to use TPOT in the text domain for classification task? Given a labeled corpus (eg. label - document pairs) I'd like to perform classification to infer the label of an unseen piece of document.

Thanks, Oliver

weixuanfu commented 7 years ago

I think this issue is related to #507. We are working on a configurable grammar in #523 to add the support for text classification. For now, you may try to transform text to numeric matrix using CountVectorizer, TFIDFVectorizer and HashingVectorizer before using TPOTClassifier for your problem.