clp-research / clembench

A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
MIT License
22 stars 31 forks source link

[backends] Fishy "||" in interactions.json #118

Open Antonia-Schmidt opened 2 weeks ago

Antonia-Schmidt commented 2 weeks ago

When running the benchmark with a fine-tuned Llama-3.1-8B-Instruct model, there appear "||" and "||||" at the end of messages in the interactions file that are not contained in the raw messages in requests.json. This can also lead to wrongfully aborted games, e.g. in the imagegame.

Gnurro commented 2 weeks ago

This is due to a recent (July 9th) change of how the HuggingFace backend removes EOS token strings from the end of the model outputs: backends/huggingface_local_api.py This interacts with EOS token strings that contain | - this is parsed as regEx choice. I will write a hotfix now.

We do need to discuss if we want to keep the newer regEx-based culling, which would require the model registry entries to contain proper regEx expressions for the culling, or if we revert back to my earlier method, which would work with bare strings and was robust against any changes in prompting/chat templates that have now actually come up. The regEx-based culling was introduced to handle inconsistent output of newlines by Gemma-2 - doing it this way is more flexible, but requires more care in writing model entries to not result in issues like this.

davidschlangen commented 2 weeks ago

How would Gemma-2 be handled in the old system? Is the increase in convenience through the regex method worth the cost?

Gnurro commented 2 weeks ago

How would Gemma-2 be handled in the old system?

Not at all. There were a lot of EOS token strings left without the regEx, as the model produces varying numbers of newlines between its pair of EOS tokens seemingly at random. This can't be handled with a simple string, hence the change as of July 9th. And I went for regEx to not introduce more specific code in the backend than necessary, leaving the model-specific handling to be set in the model registry entry.

Is the increase in convenience through the regex method worth the cost?

It is making things less convenient, actually, as anyone adding a model entry with eos_to_cull needs to make sure that it's properly capturing the model's EOS token string(s) - and eos_to_cull with regEx special characters that are intended parts of the EOS token string need to be escaped to not lead to issues with improper culling. That's also why there's so many changes to existing eos_to_cull in my current update for this: https://github.com/clp-research/clembench/pull/120