clp-research / clembench

A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
MIT License
22 stars 31 forks source link

[games] claude models seem to have problem with taboo #76

Closed davidschlangen closed 5 months ago

davidschlangen commented 5 months ago

clembench 1.0 taboo ran with claude 1 and 2; clembench 1.5 taboo does not seem to run anymore. The good old problem of two consecutive "user" role messages (which we had with private/shared), which the API is rejecting.

Gnurro commented 5 months ago

I will likely make the cleaning function written for the HF backend a general backend utility with the llama.cpp backend, so it could be used in other backends like the one that covers Claude as well.

phisad commented 5 months ago

see PR #77