clp-research / clembench

A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
MIT License
19 stars 26 forks source link

documentation on benchmarking and updating leaderboard #25

Closed sherzod-hakimov closed 3 months ago

sherzod-hakimov commented 7 months ago

add documentation that states the workflow of how the benchmarking results are presented in the leaderboard

1) Run the benchmark 2) Run the evaluation and check-in the files to clembench-runs repo 3) leaderboard automatically pulls the new versions of benchmark and updates the tables

howto_update_leaderboard.md

davidschlangen commented 7 months ago

This should have sufficient level of detail so that the task of adding a new model to the leaderboard can be given to someone who otherwise does not know much about the framework.

Maybe write in the form of an example? "Let's assume you have been given the task to add model X to the leaderboard. Here are the steps."

(Of course, if the model is not covered by an existing backend, that task would have to fall to someone who's able to do that. But let's assume were just dealing with a "new version of already covered model X" type of situation.)

sherzod-hakimov commented 3 months ago

The documentation is available here: https://github.com/clp-research/clembench/blob/main/docs/howto_benchmark_workflow.md