Alue-Benchmark / alue_baselines

Repo for reproducing ALUE benchmark baselines
MIT License
7 stars 2 forks source link

About DIAG metric #9

Closed 614479467 closed 1 year ago

614479467 commented 1 year ago

Sorry to bother you. I am a researcher in SRIBD.I want to know why the diagnostic F1 score which i test by myself is different form the result tested by alue.org?But I got same XNLI metrics tested by myself and alue.org Are XNLI and DIAG tested in the same way?

hseelawi commented 1 year ago

Hello, and thanks for your interest in ALUE. For the DIAG task we are using the matthews_corrcoef metric from sklearn. Would you please retry with said metric and let us know if you achieve agreement with the leaderboard?

614479467 commented 1 year ago

Thanks for your help!I have achieved agreement with the leaderabord.

hseelawi commented 1 year ago

Great. Thanks for reporting back. I am going to close this issue now. If face any others, please let us know.