carlini / yet-another-applied-llm-benchmark

A benchmark to evaluate language models on questions I've previously asked them to solve.
GNU General Public License v3.0
875 stars 64 forks source link

Enable support for Groq models #17

Closed simveit closed 3 months ago

simveit commented 3 months ago

I added support for Groq models. I benchmarked the Llama 3 models (using gpt-3-5-turbo as evaluator) with the following results. image

carlini commented 3 months ago

I'll try to do a full run of some llama models with a gpt4 evaluator later to add.