OpenLMLab / LEval

[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark
GNU General Public License v3.0
349 stars 14 forks source link

Except GSM100, other datasets are evaluated in 0-shot? #10

Closed zhimin-z closed 10 months ago

zhimin-z commented 10 months ago

I hope to confirm the leaderboard configuration.

ChenxinAn-fdu commented 10 months ago

Yes, we did not add examples for other tasks. We suggest modifying based on our inference code under the Baselines folder. The leaderboard leaderboard has not been updated please refer to our paper for the up-to-date results!