clp-research / clembench

A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
MIT License
22 stars 31 forks source link

[games] Do another round of prompt criticism / unification. For clembench 1.6 #87

Open davidschlangen opened 5 months ago

davidschlangen commented 5 months ago

For 1.6, go over all prompts / all games and make sure that they use same formulations for describing what counts as well-formed reply. (E.g., "Be brief. Respond only with TAG: and one word, in one line.")

AnneBeyer commented 2 months ago

FYI: Sina is starting to look into that.