An imbalanced dataset to classify Indonesian News articles. The dataset contains 5 class labels: bola, news, bisnis, tekno, and otomotif. The dataset comprises of around 6k train and 2.5k test examples, with the more prevalent classes (bola and news) having roughly 10x the number of train and test examples than the least prevalent class (otomotif).
Dataloader name:
indonesian_news_dataset/indonesian_news_dataset.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?indonesian_news_dataset