mixeval Search Results - Githubissues

Psycoy/MixEval #27

How to submit my model to the Leaderboard?

Waneila updated 6 days ago

Dao-AILab/flash-attention #1093

flash-attn NVIDIA CUDA `nvcc` error on HuggingFace spaces

I am on hugging face spaces and attempting to use `vLLM` for running benchmarks. 1 I installed `vLLM` and when I attempt to run the `mixEval` benchmarks from a local SFT model, it prompts me to in…

kevalshah90 updated 1 month ago

Psycoy/MixEval #37

Metrics calculation could become wrong

Hi @Psycoy I am using different datasets for benchmarking. I see in the metrics computation some simplifications which would lead to errors that should be fixed: 1) You divide by 2 assuming the…

carstendraschner updated 3 days ago

gabrycina/redazione #14

Fix Too many requests error

gabrycina updated 2 months ago

felipemaiapolo/tinyBenchmarks #9

How does this compare with MixEval?

https://mixeval.github.io/

ogencoglu updated 1 month ago

ubc/iPeer #593

Hi, I managed to install iPeer 3.3.2 with TeamMaker on Ubuntu 16 in the var/www/html folder for evaluation. In the home page how to I remove the show hide info at the bottom. Is there a debug mode on…

hcanning2014 updated 6 years ago

Psycoy/MixEval #4

Default SYSTEM_MESSAGE for Llama 3 Instruct is "You are a pi…

It seems like this seems would generate bad benchmark results? https://github.com/Psycoy/MixEval/blob/main/mix_eval/models/llama_3_8b_instruct.py#L18C1-L18C160 ``` self.SYSTEM_MESSAGE = {…

lhl updated 1 week ago

Psycoy/MixEval #31

Faling to evaluate through locally hosted vLLM API

The following command still evaluates on the local machine, instead of through API. ```shell python -m mix_eval.evaluate \ --model_name local_chat --model_path "meta-llama/Meta-Llama-3-8B-Ins…

tongyx361 updated 1 month ago

Psycoy/MixEval #29

What are the specific models used to compute difficulty scor…

Hi @Psycoy , I couldn't find which models were used in the difficulty score calculation of MixEval-Hard. Would it be possible to disclose the specific models/model ids? Thanks, Calvin

calvinh99 updated 1 month ago

Psycoy/MixEval #35

Non-reproducible results

Hi, I'm opening a new issue because it seems #17 was closed but not resolved and I have a similar issue. I tried reproducing the llama3-8b-instruct results too and got lower results both for hard an…

jmercat updated 1 week ago

23 results
for mixeval