HornHehhf / SocREval

4 stars 1 forks source link

Question about the result presented in paper. #1

Closed AegeanYan closed 11 months ago

AegeanYan commented 11 months ago

I found your code to process the data from ROSCOE seems only focus on the overall_result, but you claim you compare with the Fact typr error correlation with ROSCOE's 0.36? I think it's not reasonable or your codes maybe left out some of your experiment?

HornHehhf commented 11 months ago

Thanks for you questions! Sorry, I don't fully get your question "you claim you compare with the Fact type error correlation with ROSCOE's 0.36". I'm not sure the "0.36" and the "fact type error correlation" you are referring to, but I hope that the below explanation can resolve your questions.

In the main pdf (such as Tables 1-4 and Figure 3), we only compare the correlation between different reasoning evaluation measures and the human judged "overall quality" of reasoning chains. The definition for the "overall quality" of reasoning chains can be found in Table 15 in the ROSCOE paper (https://arxiv.org/pdf/2212.07919.pdf). The detailed results of for reference-free ROSCOE for each error type can be found in Tables 33-36 in the ROSCOE paper. As a reference, we indeed show the correlation between different evaluation measures with specific error types as shown in Tables 12-16 in our paper. For simplicity, we only release the code for our main experiment (Table 1) and the correlation between different measures and the specific error types can be easily achieved in a similar way.

AegeanYan commented 11 months ago

Thanks for your reply, I think I may mistaken RECEVAL with yours.