Add prompt version `0.2.1` for JCommonsenseQA

Background

In principle, "base" models (trained as just language modeling and without specific prompt format) should be evaluated with prompt version 0.2. But, we were reported 0.3 outperformed 0.2, which is weird.

So, we did comparison between 0.2 and 0.3 for some models. (thank you @mrorii !) And, we found using 0.3 increased scores for all base models in JCommonsenseQA and JNLI.

Summary

JCommonsenseQA is a question answering task given 5 choices. In 0.2, the prompt looks like below. (reference link)

質問と回答の選択肢を入力として受け取り、選択肢から回答を選択してください。なお、回答は選択肢の番号（例：0）でするものとします。 

質問:街のことは？
選択肢:0.タウン,1.劇場,2.ホーム,3.ハウス,4.ニューヨークシティ
回答:

The prompt encourages to answer by "index" rather than the text itself. But, the targets are actually texts. So, I assume models were messed up somehow by this gap. (code)

Solution

Introduced a new prompt version 0.2.1 for base models, which outperformed 0.2 and 0.3.

Model	prompt version	acc
elyza/ELYZA-japanese-Llama-2-7b	0.2	31.64
elyza/ELYZA-japanese-Llama-2-7b	0.3	38.96
elyza/ELYZA-japanese-Llama-2-7b	0.21 (NEW!)	45.49
matsuo-lab/weblab-10b	0.2	23.32
matsuo-lab/weblab-10b	0.3	42.27
matsuo-lab/weblab-10b	0.21 (NEW!)	25.47

* all evals were performed with hf-causal-experimental and jcommonsenseqa-1.1-{prompt version}

Stability-AI / lm-evaluation-harness