Open Lxb-Code-Dev opened 6 months ago
Hi, the setting of "Text and Image evidence (Gold)" means we leverage annotated evidence to do the claim verification task, while the setting of "Text and Image evidence (System)" means we use our retrieval model to retrieve evidence to do the claim verification task.
The eval.sh is for the "Text and Image evidence (Gold)" setting. You can add the argument "--evidence_file_name=#path_to_retrieved_evidence" to obtain the "Text and Image evidence (System)" performance.
Hi, the setting of "Text and Image evidence (Gold)" means we leverage annotated evidence to do the claim verification task, while the setting of "Text and Image evidence (System)" means we use our retrieval model to retrieve evidence to do the claim verification task.
The eval.sh is for the "Text and Image evidence (Gold)" setting. You can add the argument "--evidence_file_name=#path_to_retrieved_evidence" to obtain the "Text and Image evidence (System)" performance.
Thank you for your response! I have another question while retrieving the evidence, which checkpoint should I load to use for retrieval? I found that there are multiple checkpoints under the checkpoint folder you provided.
@Barry-Menglong-Yao
Hello, I have a similar issue with @Lxb-Code-Dev .
The result shows a different value from the table in the paper.
I have given the arguments identical to the ones in eval.sh
.
Can you tell me where the difference might have come from?
Following is the result.
{
"Refuted":{
"precision":0.5333333333333333,
"recall":0.8145454545454546,
"f1-score":0.6446043165467625,
"support":825.0
},
"Supported":{
"precision":0.5357142857142857,
"recall":0.5507955936352509,
"f1-score":0.5431502715751357,
"support":817.0
},
"NEI":{
"precision":0.6052631578947368,
"recall":0.25875,
"f1-score":0.36252189141856395,
"support":800.0
},
"accuracy":0.5442260442260443,
"macro avg":{
"precision":0.5581035923141187,
"recall":0.5413636827269018,
"f1-score":0.5167588265134874,
"support":2442.0
},
"weighted avg":{
"precision":0.557694143220459,
"recall":0.5442260442260443,
"f1-score":0.5182513702550435,
"support":2442.0
}
}
Hi, Barry Menglong Yao, Thank you for your work! I am currently trying to reproduce the work, but when running eval.sh as directed by the readme file, I find that the experimental results are quite different from the paper. I want to ask which setting (Gold or System) the results from following the eval.sh run is for, I'm guessing the system setting, but the results are closer to Gold.