EveripediaNetwork / issues

Issues repo
9 stars 0 forks source link

Prepare the leaderboard automated submission pipeline #2998

Closed brunneis closed 1 month ago

brunneis commented 1 month ago

Prepare the leaderboard to launch the solbench/NaïveJudge benchmark when models are submitted.

not-lain commented 1 month ago

done ✅

kesar commented 1 month ago

done ✅

plz associate with PR / code so we anyone can review it

not-lain commented 1 month ago

since we are using our own benchmark we opted for a manual approach using the following code .

this is inspired by the failure/corner cases for the original evaluations, source :


image

kesar commented 1 month ago

lets ensure we have those notebooks in some company repository then, otherwise its gonna be very messy to follow the development / share knowledge with other devs

not-lain commented 1 month ago

good idea, will clean this up a little and store it ❤️