autogluon / autogluon

Fast and Accurate ML in 3 Lines of Code
https://auto.gluon.ai/
Apache License 2.0
7.84k stars 919 forks source link

Text Classification Customized Dataset #167

Closed zhanghang1989 closed 4 years ago

zhanghang1989 commented 4 years ago

TODO: need to implement customized dataset for NLP some prototype we can use:

https://github.com/awslabs/autogluon/blob/master/autogluon/task/text_classification/dataset.py#L109

songqiang commented 4 years ago

This feature is very necessary. Without supporting customized datasets, the TextClassification is hard to use for real data. The dataset class in the tabular is more feature complete, and we would better to adopt a consistent interface. Basically, allowing loading data from path or a pandas dataframe

zzdgit commented 4 years ago

@Innixma Excuse me, do you support the case of Chinese data set?

Innixma commented 4 years ago

@Innixma Excuse me, do you support the case of Chinese data set?

At present for Tabular, a Chinese data set for text will not work well due to the lack of space characters to generate ngrams with. In regards to the text classification module, @cgraywang can you answer this?

ethanqi1109 commented 4 years ago

@Innixma do you support customized dataset for text classification now?

Innixma commented 4 years ago

@Innixma do you support customized dataset for text classification now?

We will once #556 is merged, which is a full overhaul to the TextClassification task.

jwmueller commented 4 years ago

Easy usage of custom text datasets has been addressed in: https://github.com/awslabs/autogluon/pull/556