clp-research / clembench

A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
MIT License
19 stars 26 forks source link

Separate Game Scoring from GameMaster and extract functionality into a GameScorer class #43

Closed lpfennigschmidt closed 4 months ago

lpfennigschmidt commented 5 months ago

As mentioned in #40 , this would be my suggested way to separate the scoring from the actual running of games, as they are also run separately. You can also find a "quick and dirty" way of including this breaking change into taboo, I would expect other games to work similarly, but have not had the time to try that out. I commented out some functions and deleted some preparation calls as I think these are not necessary for scoring, but please correct me if I am wrong there. The sub-functions that are called in the new compute_scores() function are also only personal "preference", my main point is separating the needed attributes and logging functions in to another class. How the compute_scores() method looks in the end is not important for this PR. But this way, it would make the interface clearer, that request information should be logged and also defines in which way it should be logged, as these are required to calculate other required scores for the bencheval.py script.

Again, commits should be squashed when merging 😅