declare-lab / flan-alpaca

This repository contains code for extending the Stanford Alpaca synthetic instruction tuning to existing instruction-tuned models such as Flan-T5.
Apache License 2.0
348 stars 38 forks source link

Performance of the model on gsm8k/SVAMP/MultiArith. #22

Open hccngu opened 1 year ago

hccngu commented 1 year ago
Thank you for your excellent project. I have conducted an evaluation of Flan-Alpaca-Base/Large/XL on the gsm8k/SVAMP/MultiArith datasets, and the evaluation results are as follows: Model gsm8k MultiArith SVAMP
Flan-Alpaca-Base 13.42 20.33 19.50
Flan-Alpaca-Large 14.40 19.83 17.80
Flan-Alpaca-XL 9.25 13.83 14.30

Overall, the larger the number of parameters in the model, the worse its performance. What do you think is the reason for this? Also, did you use the test sets of the three datasets mentioned above to train the model? If so, could the reason for this be that the smaller model overfit on the test data? Thank you~

chiayewken commented 1 year ago

Hi, thanks for the interesting analysis! The gsm8k and SVAMP datasets are indeed used for Flan-T5 training but we are not sure about the reason for the trend of worse performance with model size. This definitely deserves a closer look, please let us know what you find!