VT-NLP / Mocheg

Dataset and Code for Multimodal Fact Checking and Explanation Generation (Mocheg)
Apache License 2.0
36 stars 8 forks source link

issue for verification result #10

Open Lxb-Code-Dev opened 6 months ago

Lxb-Code-Dev commented 6 months ago

Hi, Barry Menglong Yao, Thank you for your work! I am currently trying to reproduce the work, but when running eval.sh as directed by the readme file, I find that the experimental results are quite different from the paper. I want to ask which setting (Gold or System) the results from following the eval.sh run is for, I'm guessing the system setting, but the results are closer to Gold. image image

Barry-Menglong-Yao commented 6 months ago

Hi, the setting of "Text and Image evidence (Gold)" means we leverage annotated evidence to do the claim verification task, while the setting of "Text and Image evidence (System)" means we use our retrieval model to retrieve evidence to do the claim verification task.

The eval.sh is for the "Text and Image evidence (Gold)" setting. You can add the argument "--evidence_file_name=#path_to_retrieved_evidence" to obtain the "Text and Image evidence (System)" performance.

Lxb-Code-Dev commented 6 months ago

Hi, the setting of "Text and Image evidence (Gold)" means we leverage annotated evidence to do the claim verification task, while the setting of "Text and Image evidence (System)" means we use our retrieval model to retrieve evidence to do the claim verification task.

The eval.sh is for the "Text and Image evidence (Gold)" setting. You can add the argument "--evidence_file_name=#path_to_retrieved_evidence" to obtain the "Text and Image evidence (System)" performance.

Thank you for your response! I have another question while retrieving the evidence, which checkpoint should I load to use for retrieval? I found that there are multiple checkpoints under the checkpoint folder you provided. image image

given131 commented 6 months ago

@Barry-Menglong-Yao Hello, I have a similar issue with @Lxb-Code-Dev . The result shows a different value from the table in the paper. I have given the arguments identical to the ones in eval.sh. Can you tell me where the difference might have come from?

Following is the result.

{
   "Refuted":{
      "precision":0.5333333333333333,
      "recall":0.8145454545454546,
      "f1-score":0.6446043165467625,
      "support":825.0
   },
   "Supported":{
      "precision":0.5357142857142857,
      "recall":0.5507955936352509,
      "f1-score":0.5431502715751357,
      "support":817.0
   },
   "NEI":{
      "precision":0.6052631578947368,
      "recall":0.25875,
      "f1-score":0.36252189141856395,
      "support":800.0
   },
   "accuracy":0.5442260442260443,
   "macro avg":{
      "precision":0.5581035923141187,
      "recall":0.5413636827269018,
      "f1-score":0.5167588265134874,
      "support":2442.0
   },
   "weighted avg":{
      "precision":0.557694143220459,
      "recall":0.5442260442260443,
      "f1-score":0.5182513702550435,
      "support":2442.0
   }
}