Performance of the model on gsm8k/SVAMP/MultiArith.

declare-lab / flan-alpaca

This repository contains code for extending the Stanford Alpaca synthetic instruction tuning to existing instruction-tuned models such as Flan-T5.

Apache License 2.0

348 stars 38 forks source link

Thank you for your excellent project. I have conducted an evaluation of Flan-Alpaca-Base/Large/XL on the gsm8k/SVAMP/MultiArith datasets, and the evaluation results are as follows:	Model	gsm8k	MultiArith
Flan-Alpaca-Base	13.42	20.33	19.50
Flan-Alpaca-Large	14.40	19.83	17.80
Flan-Alpaca-XL	9.25	13.83	14.30

Overall, the larger the number of parameters in the model, the worse its performance. What do you think is the reason for this? Also, did you use the test sets of the three datasets mentioned above to train the model? If so, could the reason for this be that the smaller model overfit on the test data? Thank you~

declare-lab / flan-alpaca

Performance of the model on gsm8k/SVAMP/MultiArith. #22