Add 4 new evaluation tasks for 4 JP models

Overview

This PR is a follow-up to #67 and #72, and adds 4 new evaluation tasks:

JAQKET_V2 (1-shot)
xlsum_ja (1-shot)
wxinograd_ja (0-shot)
mgsm (5-shot)

for the following 4 JP models:

stablelm-jp-3b-ja50_rp50-700b (with prompt versions 0.1 and 0.2)
cyberagent-open-calm-7b
stablelm-jp-1b-jav1_rp-sl2k-slw-300b
stablelm-jp-1b-jav1-sl2k-slw-300b

Details

Similar to #67 and #72, while adding these 4 new evaluation tasks for the above 4 models, this PR also re-orders the tasks in all of the models to match the order shown in the Eval Leaderboard for consistency.

Stability-AI / lm-evaluation-harness

Add 4 new evaluation tasks for 4 JP models #74

Overview

Details