clembench / clembench-runs

All outputs generated by running the benchmark on different versions
MIT License
0 stars 5 forks source link

test-files workflow

Benchmark Runs

Leaderboard of all runs is available here: Clem Leaderboard

Versions

v0.9 - June 2023

v1.0 - November 2023

v1.5 - March 2024

v1.6 - May 2024

Supported Models

The list of supported open & closed/commercial models can be found here: model registry

Game-play files

Each model has a separate folder for each game result. The outputs are organised as follows: /model/game/experiment. Each episode under a certain experiment includes the following files:

Results files

Each run of the benchmark generates CSV and HTML files for all tested models across all games (results.csv & results.html).