Enable support for Groq models

carlini / yet-another-applied-llm-benchmark

A benchmark to evaluate language models on questions I've previously asked them to solve.

GNU General Public License v3.0

875 stars 64 forks source link

Closed simveit closed 3 months ago

simveit commented 3 months ago

I added support for Groq models. I benchmarked the Llama 3 models (using gpt-3-5-turbo as evaluator) with the following results.

carlini commented 3 months ago

I'll try to do a full run of some llama models with a gpt4 evaluator later to add.