Closed faneshion closed 3 months ago
@QianHaosheng As ELI5 is a part of KILT, we should move the ELI5 benchmark into KILT benchmark, where the structure of the directory looks like:
- rageval
- rageval
- benchmarks
- KILT
- FEVER
- ELI5
- ...
- ASQA
- BBQ
- ...
- tests
- ...
It is worth to note that Dataset "eli5" is defunct and no longer accessible from huggingface. We can still download the train and validation dataset from this repo (https://github.com/facebookresearch/KILT?tab=readme-ov-file).
This issue is to add ELI5 benchmark to cover all evaluation dimensions.
@QianHaosheng @Wenshansilvia We can discuss it in detail.
The pipeline of the test maybe as follows: