Closed davidschlangen closed 8 months ago
Turns out the invocation was wrong (should be model pairs). This should not fail silently. There must be a way of knowing that the above is not the expected output from a successful run.
Solution: validate command line parameters when parsing them, and fail meaningfully.
The information was only given in clembench.log but not written to stdout. With commit 614215c the error message is now also written to the stdout.
python3 scripts/cli.py -m asdasf run hellogame
2023-11-01 15:22:31,253 - benchmark.run - INFO - Run game 1 of 1: hellogame
2023-11-01 15:22:31,253 - benchmark.run - INFO - Run experiment 1 of 1: greet_en
2023-11-01 15:22:31,253 - benchmark.run - ERROR - Invalid model pairing ['asdasf'] for a multi-player game. For single-player expected only a single model, otherwise a pair.
The program would otherwise behave as before and simply notify the user via the console.
python3 scripts/cli.py -m gpt-4 run taboo
Loaded backends: anthropic,openai,alephalpha
2023-11-01 15:31:01,716 - benchmark.run - INFO - Run game 1 of 1: taboo
2023-11-01 15:31:01,716 - benchmark.run - INFO - Run experiment 1 of 3: high_en
2023-11-01 15:31:01,716 - benchmark.run - ERROR - Invalid model pairing ['gpt-4'] for a multi-player game. For single-player expected only a single model, otherwise a pair.
The question is, if this is enough for now?
yes, that should suffice for now
I'm running the following:
python scripts/cli.py -m gpt-3.5-turbo run taboo
I get the following output:
But there is no
records
folder to be found anywhere.