clp-research / clembench

A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
MIT License
26 stars 34 forks source link

Update EOS culling to proper regEx substitution #120

Closed Gnurro closed 2 months ago

Gnurro commented 3 months ago

Due to more flexible culling by re.sub introduced in July, some model outputs were culled wrongly - this should solve this issue. As the regEx culling is more future-proof, it is kept in the HF backend and added to the llama-cpp backend. Model entries updated accordingly. Since the regEx handling changes how the eos_to_cull key needs to be written, specially important for custom models like finetunes, the documentation on it is updated.

Changes:

Gnurro commented 2 months ago

Compared v1.6 requests with ones produced with the changes here, and it works properly. Notably, Llama3.1-8b-Instruct and Llama3.1-72b-Instruct do not get immediately aborted on imagegame anymore, since there is no extraneous || at the end of their cleaned responses anymore.

Testing result files will be removed before PR is put into review once this has been double-checked.

Gnurro commented 2 months ago

Testing data removed, ready for merge.