Closed 614479467 closed 1 year ago
Hello, and thanks for your interest in ALUE
. For the DIAG
task we are using the matthews_corrcoef
metric from sklearn
. Would you please retry with said metric and let us know if you achieve agreement with the leaderboard?
Thanks for your help!I have achieved agreement with the leaderabord.
Great. Thanks for reporting back. I am going to close this issue now. If face any others, please let us know.
Sorry to bother you. I am a researcher in SRIBD.I want to know why the diagnostic F1 score which i test by myself is different form the result tested by alue.org?But I got same XNLI metrics tested by myself and alue.org Are XNLI and DIAG tested in the same way?