# Create an eval_client as usual
gen_output = [
"Tokyo is the capital of Japan.",
"Osaka is the capital of Japan.",
"Washington, D.C. is the capital of the US.",
"Many people consider New York City is the capital of the US. But it is technically Washington, D.C.",
"The capital of BlahBlah land is Blahtropolis.",
"I am not sure what the capital of BlahBlah land is.",
]
prompt = [
"what is the capital of Japan?",
"what is the capital of Japan?",
"what is the capital of the US?",
"what is the capital of the US?",
"what is the capital of BlahBlah land?",
"what is the capital of BlahBlah land?",
]
ref_outputs = [
"Tokyo",
"Tokyo",
"Washington, D.C.",
"Washington, D.C.",
"Blahtropolis.",
"Blahtropolis.",
]
langcheck.metrics.answer_correctness(
gen_output, ref_outputs, prompt, eval_model=eval_client
)
Implemented answer correctness for en and ja.
Here are some example test cases: