In the CFEVER paper, we test the CFEVER test set using the following baselines:
For the first two baselines, please refer to their source code:
Please go to the CFEVER-data repository to download the our dataset.
git clone https://github.com/IKMLab/CFEVER-baselines.git
cd CFEVER-baselines
CFEVER is a Chinese Fact Extraction and VERification dataset. Similar to FEVER (Thorne et al., 2018), the CFEVER task is to verify the veracity of a given claim while providing evidence from the Chinese Wikipedia (we provide our processed version in the CFEVER-data repository). Therefore, the task is split into three sub-tasks:
pip install -r requirements.txt
Plase refer to the simple_baseline folder and check the README.md for more details.
To evaluate document retrieval, you need to pass two paths to the script eval_doc_example.py
:
$GOLD_FILE
: the path to the file with gold answers in the jsonl
format.$DOC_PRED_FILE
: the path to the file with predicted documents in the jsonl
format.
python eval_doc_retrieval.py \
--source_file $GOLD_FILE \
--doc_pred_file $DOC_PRED_FILE
The example command is shown below:
python eval_doc_retrieval.py \
--source_file simple_baseline/data/dev.jsonl \
--doc_pred_file simple_baseline/data/bm25/dev_doc10.jsonl
Note that our evaluation of document retrieval aligns with the way of BEVERS. See BEVERS's code.
--top_k
parameter. For example, --top_k 10
will evaluate the first 10 predicted pages.We follow the same evaluation script of fever-scorer and add some parameters to run the script:
python eval_sent_retrieval_rte.py \
--gt_file $GOLD_FILE \
--submission_file $PRED_FILE
where $GOLD_FILE
is the path to the file with gold answers in the jsonl
format and $PRED_FILE
is the path to the file with predicted answers in the jsonl
format. The example command is shown below:
python eval_sent_retrieval_rte.py \
--gt_file simple_baseline/data/dev.jsonl \
--submission_file simple_baseline/data/dumb_dev_pred.jsonl
The script will output the scores of sentence retrieval:
and the scores of claim verification:
Label accuracy
)Strict accuracy
)If you find our work useful, please cite our paper.
@article{Lin_Lin_Yeh_Li_Hu_Hsu_Lee_Kao_2024,
title = {CFEVER: A Chinese Fact Extraction and VERification Dataset},
author = {Lin, Ying-Jia and Lin, Chun-Yi and Yeh, Chia-Jen and Li, Yi-Ting and Hu, Yun-Yu and Hsu, Chih-Hao and Lee, Mei-Feng and Kao, Hung-Yu},
doi = {10.1609/aaai.v38i17.29825},
journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
month = {Mar.},
number = {17},
pages = {18626-18634},
url = {https://ojs.aaai.org/index.php/AAAI/article/view/29825},
volume = {38},
year = {2024},
bdsk-url-1 = {https://ojs.aaai.org/index.php/AAAI/article/view/29825},
bdsk-url-2 = {https://doi.org/10.1609/aaai.v38i17.29825}
}