We can add the auto_benchmark.py to generate testsets based on an input corpus. The function would include multiple steps:
1) sample a set of passages;
2) generate one or multiple questions for each passage;
3) generate the answer for each <question, passage> pair;
4) validate all the <question, passage, answer> triples.
We can add the
auto_benchmark.py
to generate testsets based on an input corpus. The function would include multiple steps: 1) sample a set of passages; 2) generate one or multiple questions for each passage; 3) generate the answer for each <question, passage> pair; 4) validate all the <question, passage, answer> triples.