mixeval Search Results - Githubissues

27 results
for mixeval

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Psycoy/MixEval #4

Default SYSTEM_MESSAGE for Llama 3 Instruct is "You are a pi…

It seems like this seems would generate bad benchmark results? https://github.com/Psycoy/MixEval/blob/main/mix_eval/models/llama_3_8b_instruct.py#L18C1-L18C160 ``` self.SYSTEM_MESSAGE = {…

lhl updated 2 months ago
4
Psycoy/MixEval #31

Faling to evaluate through locally hosted vLLM API

The following command still evaluates on the local machine, instead of through API. ```shell python -m mix_eval.evaluate \ --model_name local_chat --model_path "meta-llama/Meta-Llama-3-8B-Ins…

tongyx361 updated 2 months ago
1
Psycoy/MixEval #29

What are the specific models used to compute difficulty scor…

Hi @Psycoy , I couldn't find which models were used in the difficulty score calculation of MixEval-Hard. Would it be possible to disclose the specific models/model ids? Thanks, Calvin

calvinh99 updated 3 months ago
1
Psycoy/MixEval #17

(Non) Reproducible Experiment Results

Hi, I tried to reproduce the experiment results on a A100 while using Azure open AI API with GPT-35-Turbo-1106 as judge: for Mistral7B it was fine for LLAMA8B it was: 0.39(mine) vs 0.46(yours)…

carstendraschner updated 2 months ago
4
Psycoy/MixEval #35

Non-reproducible results

Hi, I'm opening a new issue because it seems #17 was closed but not resolved and I have a similar issue. I tried reproducing the llama3-8b-instruct results too and got lower results both for hard an…

jmercat updated 2 months ago
5
Psycoy/MixEval #23

Problem in Inference of Local Chat Model

Hello :) I face an Issues with Inference by using local chat. as local chat overwrites build transformer of Base it does not include the default padding = left which is needed for inference. I …

carstendraschner updated 3 months ago
1
Psycoy/MixEval #28

Question?

Sorry guys, new to evals and benchmarks, I was hoping someone could point me in the right direction. I am currently trying to evaluate the baseline quality of several open-source model quants. I have …

aallsbury updated 3 months ago
1
Psycoy/MixEval #10

Duplicates in benchmark data

Hi MixEval Team & @Psycoy , thanks for your repo and your work to improve open source LLM benchmarks! Issue: While testing I discovered the following: In (also my mode) response files for mixev…

carstendraschner updated 4 months ago
3
Psycoy/MixEval #8

The answer set here is a 100% wrong.

https://github.com/Psycoy/MixEval/blob/03ee6e606d3b5af8fdb2b1da711f5672d0c98482/mix_eval/data/mixeval-2024-06-01/mixeval/free-form.json#L672 The answerset is a bunch of buzzwords from the harry pot…

XapaJIaMnu-at-meta updated 4 months ago
1
Psycoy/MixEval #12

Support for Azure OpenAI API?

Does MixEval work with Azure OpenAI API (as judgement models)? Or how can I modify the codes to get it to work?

Ignoramus0817 updated 4 months ago
2

上一页 1...1 2 3...3 下一页

27 results for mixeval

27 results
for mixeval