Alue-Benchmark / alue_baselines

Repo for reproducing ALUE benchmark baselines
MIT License
7 stars 2 forks source link

Inconsistency of results on MDD task #6

Closed Jason3900 closed 1 year ago

Jason3900 commented 1 year ago

Hey, I found that the submission results of MDD task is not the same as the ones I run locally. I'm wondering how does that happen.

hseelawi commented 1 year ago

Hello,

We use f1 score with macro weighing. The implementation we use is that of sklearn.metrics.f1_score. Would you please let us know what f1 score implementation are you using?

On Wed, 30 Nov 2022, 23:32 jasonfang, @.***> wrote:

Hey, I found that the submission results of MDD task is not the same as the ones I run locally. I'm wondering how does that happen.

— Reply to this email directly, view it on GitHub https://github.com/Alue-Benchmark/alue_baselines/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDF7H4SO3UVTFUFFIOWG4TWK7PYDANCNFSM6AAAAAASQFK7OU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Jason3900 commented 1 year ago

Yeah, I use the exact same metric but it turns out to have 1pp gap. However, other f1-macro task such as fid doesn't seem to have this issue.

hseelawi commented 1 year ago

Would you please try using scikit-learn 1.0.1, just to rule out any dependency related issues? I have also resent you the dataset splits.

hseelawi commented 1 year ago

@Jason3900 given our corrospondence via email, I believe this issue is solved now. I am gonna close it, but please feel free to open it otherwise.

Jason3900 commented 1 year ago

Thanks a lot!