Sorry guys, new to evals and benchmarks, I was hoping someone could point me in the right direction. I am currently trying to evaluate the baseline quality of several open-source model quants. I have them hosted via openai api (Oobabooga) at a url with an api key. Is there an easy way to alter this program to test my different models at the same api end point? I was able to alter MMLU-Pro to do this, but can't seem to figure it out with MixEval. Any help would be appreciated, thank you!
I am planning to test:
Llama 3.1 70b Instruct EXL2 4.0bpw
Mixtral 8x7b Instruct EXL2 6.0bpw
Gemma 2 27B IT EXL2 8.0bpw
Sorry guys, new to evals and benchmarks, I was hoping someone could point me in the right direction. I am currently trying to evaluate the baseline quality of several open-source model quants. I have them hosted via openai api (Oobabooga) at a url with an api key. Is there an easy way to alter this program to test my different models at the same api end point? I was able to alter MMLU-Pro to do this, but can't seem to figure it out with MixEval. Any help would be appreciated, thank you!
I am planning to test: Llama 3.1 70b Instruct EXL2 4.0bpw Mixtral 8x7b Instruct EXL2 6.0bpw Gemma 2 27B IT EXL2 8.0bpw
Thank you!