This PR is a follow-up to #67 and #72, and adds 4 new evaluation tasks:
JAQKET_V2 (1-shot)
xlsum_ja (1-shot)
wxinograd_ja (0-shot)
mgsm (5-shot)
for the following 4 JP models:
stablelm-jp-3b-ja50_rp50-700b (with prompt versions 0.1 and 0.2)
cyberagent-open-calm-7b
stablelm-jp-1b-jav1_rp-sl2k-slw-300b
stablelm-jp-1b-jav1-sl2k-slw-300b
Details
Similar to #67 and #72, while adding these 4 new evaluation tasks for the above 4 models, this PR also re-orders the tasks in all of the models to match the order shown in the Eval Leaderboard for consistency.
Overview
This PR is a follow-up to #67 and #72, and adds 4 new evaluation tasks:
for the following 4 JP models:
Details
Similar to #67 and #72, while adding these 4 new evaluation tasks for the above 4 models, this PR also re-orders the tasks in all of the models to match the order shown in the Eval Leaderboard for consistency.