Stability-AI / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.
MIT License
145 stars 47 forks source link

Add 4 new evaluation tasks for 4 JP models #74

Closed mrorii closed 1 year ago

mrorii commented 1 year ago

Overview

This PR is a follow-up to #67 and #72, and adds 4 new evaluation tasks:

for the following 4 JP models:

Details

Similar to #67 and #72, while adding these 4 new evaluation tasks for the above 4 models, this PR also re-orders the tasks in all of the models to match the order shown in the Eval Leaderboard for consistency.