-
It seems like this seems would generate bad benchmark results?
https://github.com/Psycoy/MixEval/blob/main/mix_eval/models/llama_3_8b_instruct.py#L18C1-L18C160
```
self.SYSTEM_MESSAGE = {…
-
The following command still evaluates on the local machine, instead of through API.
```shell
python -m mix_eval.evaluate \
--model_name local_chat --model_path "meta-llama/Meta-Llama-3-8B-Ins…
-
Hi @Psycoy ,
I couldn't find which models were used in the difficulty score calculation of MixEval-Hard. Would it be possible to disclose the specific models/model ids?
Thanks,
Calvin
-
Hi,
I tried to reproduce the experiment results on a A100 while using Azure open AI API with GPT-35-Turbo-1106 as judge:
for Mistral7B it was fine
for LLAMA8B it was: 0.39(mine) vs 0.46(yours)…
-
Hi, I'm opening a new issue because it seems #17 was closed but not resolved and I have a similar issue.
I tried reproducing the llama3-8b-instruct results too and got lower results both for hard an…
-
Hello :)
I face an Issues with Inference by using local chat.
as local chat overwrites build transformer of Base it does not include the default padding = left which is needed for inference.
I …
-
Sorry guys, new to evals and benchmarks, I was hoping someone could point me in the right direction. I am currently trying to evaluate the baseline quality of several open-source model quants. I have …
-
Hi MixEval Team & @Psycoy ,
thanks for your repo and your work to improve open source LLM benchmarks!
Issue:
While testing I discovered the following: In (also my mode) response files for mixev…
-
https://github.com/Psycoy/MixEval/blob/03ee6e606d3b5af8fdb2b1da711f5672d0c98482/mix_eval/data/mixeval-2024-06-01/mixeval/free-form.json#L672
The answerset is a bunch of buzzwords from the harry pot…
-
Does MixEval work with Azure OpenAI API (as judgement models)? Or how can I modify the codes to get it to work?