Stability-AI / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
41 stars 21 forks source link

Missing prompts “single-v1-multi-turn” and “single-math-v1-multi-turn” in Japanese MT-Bench #16

Open Kosuke-Yamada opened 4 months ago

Kosuke-Yamada commented 4 months ago

Thank you for maintaining the benchmarks. Currently, I am evaluating models with Japanese MT-Bench, and I need the prompts “single-v1-multi-turn” and “single-math-v1-multi-turn” to execute the function 'make_judge_simgle' in gen_judgement.py, but I can't find them. There seems to be only the prompts “single-v1” and “single-math-v1” in fastchat/llm_judge/data/judge_ja_prompts.jsonl. I would appreciate it if you could tell me where to find them or how to evaluate them.

shyram commented 3 months ago

There is no evaluation result for the second turn in model_judge file as well.

Where are the evaluation results for the second turn?