Closed mkshing closed 1 year ago
Hi, thank you for adding the evaluation for Rinna models. I have a question,
How to decide which prompt to be used for evaluating model? In this foundation model case, why v0.2 is being used instead of v0.3? because the accuracy result of both prompt are much different.
Thank you!
@effendijohanes it really depends on the data format during training. Please see the below link for each prompt version. https://github.com/Stability-AI/lm-evaluation-harness/blob/jp-stable/docs/prompt_templates.md
Description
Evaluated the following rinna bilingual models with our 8-task setting