Closed brunneis closed 1 month ago
done ✅
done ✅
plz associate with PR / code so we anyone can review it
since we are using our own benchmark we opted for a manual approach using the following code .
this is inspired by the failure/corner cases for the original evaluations, source :
lets ensure we have those notebooks in some company repository then, otherwise its gonna be very messy to follow the development / share knowledge with other devs
good idea, will clean this up a little and store it ❤️
Prepare the leaderboard to launch the solbench/NaïveJudge benchmark when models are submitted.