Closed zhanghang1989 closed 4 years ago
This feature is very necessary. Without supporting customized datasets, the TextClassification is hard to use for real data. The dataset class in the tabular is more feature complete, and we would better to adopt a consistent interface. Basically, allowing loading data from path or a pandas dataframe
@Innixma Excuse me, do you support the case of Chinese data set?
@Innixma Excuse me, do you support the case of Chinese data set?
At present for Tabular, a Chinese data set for text will not work well due to the lack of space characters to generate ngrams with. In regards to the text classification module, @cgraywang can you answer this?
@Innixma do you support customized dataset for text classification now?
@Innixma do you support customized dataset for text classification now?
We will once #556 is merged, which is a full overhaul to the TextClassification task.
Easy usage of custom text datasets has been addressed in: https://github.com/awslabs/autogluon/pull/556
TODO: need to implement customized dataset for NLP some prototype we can use:
https://github.com/awslabs/autogluon/blob/master/autogluon/task/text_classification/dataset.py#L109