This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
528
stars
42
forks
source link
Reproduce the accuracy of chavinlo/alpaca-native on MMLU #25
Open
sglucas opened 1 year ago
Hi
I try to evaluate the accuracy of chavinlo/alpaca-native on MMLU.
I find the final accuracy is about 36 and I cannot reproduce the result about 41.6.
May I ask which parts I need to focus on, the setup, environments
Best Lucas