UIT-ViON (Vietnamese Online Newspaper) is a dataset collected from well-known online newspapers in Vietnamese. The UIT-ViON is an open-domain, large-scale, and high-quality dataset consisting of 260,000 textual data points annotated with 13 different categories for evaluating Vietnamese short text classification. The dataset is split into training, validation, and test sets, each containing 208000, 26000, and 26000 pieces of text, respectively.
Dataloader name:
uit_vion/uit_vion.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?uit_vion