Performance of AGNews Dataset

yueyu1030 commented 2 years ago

Dear authors,

In your paper, I found that the performance of AGNews achieve > 94% with 300 labeled data (Balanced setting) in Figure 1, page 6. I tried my best to reproduce this result, but find it very difficult. Also, the result in some recent papers is also lower than your paper such as (https://aclanthology.org/2021.emnlp-main.51.pdf, https://arxiv.org/pdf/2107.05687.pdf). Could you elaborate more on the experiment setup, and how to reproduce this result? Thanks!

arielge commented 2 years ago

Hi @yueyu1030, In our paper, the experiments were performed in a binary setting - we chose one class for each dataset (in the case of AGNews this is the 'World' class), and tested the performance on this target class. This means our results are not comparable to the other papers you mention, as we use 300 labeled examples to fine-tune the model on a binary classification task, whereas IIUC they use the budget of 300 examples to learn a multi-class classification problem between the 4 classes in AGNews. If you wish to replicate our results in a binary setting, you can run experiment_runner_balanced.py with e.g. datasets_and_categories = {'ag_news': ['1']} (1 is the AGNews "World" class). Using the BERT model (i.e. classification_models = [ModelTypes.HFBERT]) you should get similar results to those reported in Figure 1. Feel free to reach out if you have any issues with running the experiments using the repo.

yueyu1030 commented 2 years ago

Thanks for the reply!

IBM / low-resource-text-classification-framework

Performance of AGNews Dataset #7