clp-research / clembench

A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
MIT License
19 stars 26 forks source link

documentation `howto_run_benchmark.md` is incomplete / unhelpful #6

Closed davidschlangen closed 7 months ago

davidschlangen commented 8 months ago

The subsection running the benchmark should be more precise.

At the moment, it says "run the cli script" and then gives python3 scripts/cli.py --help. But that only gives the help text..

Should say something like "make sure that you do not get any error messages. Now, check that you can run a single game. For example, try python scripts/cli.py -m gpt-3.5-turbo run taboo. This verifies that you OpenAI key is working. You should be seeing something like ... and find a new directory .... in games/taboo...