-
-
I am on hugging face spaces and attempting to use `vLLM` for running benchmarks. 1
I installed `vLLM` and when I attempt to run the `mixEval` benchmarks from a local SFT model, it prompts me to in…
-
Hi @Psycoy
I am using different datasets for benchmarking. I see in the metrics computation some simplifications which would lead to errors that should be fixed:
1) You divide by 2 assuming the…
-
-
https://mixeval.github.io/
-
Hi,
I managed to install iPeer 3.3.2 with TeamMaker on Ubuntu 16 in the var/www/html folder for evaluation. In the home page how to I remove the show hide info at the bottom. Is there a debug mode on…
-
It seems like this seems would generate bad benchmark results?
https://github.com/Psycoy/MixEval/blob/main/mix_eval/models/llama_3_8b_instruct.py#L18C1-L18C160
```
self.SYSTEM_MESSAGE = {…
-
The following command still evaluates on the local machine, instead of through API.
```shell
python -m mix_eval.evaluate \
--model_name local_chat --model_path "meta-llama/Meta-Llama-3-8B-Ins…
-
Hi @Psycoy ,
I couldn't find which models were used in the difficulty score calculation of MixEval-Hard. Would it be possible to disclose the specific models/model ids?
Thanks,
Calvin
-
Hi, I'm opening a new issue because it seems #17 was closed but not resolved and I have a similar issue.
I tried reproducing the llama3-8b-instruct results too and got lower results both for hard an…