mmlu Search Results - Githubissues

1000+ results
for mmlu

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

EleutherAI/lm-evaluation-harness #2122

ValueError when task name collide with local directory names

# Steps to reproduce ```bash $ cd lm-eval-harness $ pip install -e .[vllm] $ mkdir hellaswag $ lm-eval --tasks hellaswag --model vllm --model_args pretrained=deepseek-ai/deepseek-coder-1.3b-instr…

alat-rights updated 3 months ago
1
ollama/ollama #5641

Ollama Puts out Gibberish After a While.

### What is the issue? When I run the MMLU Pro benchmark on phi3 or deepseek-coder-v2 with [this script](https://github.com/chigkim/Ollama-MMLU-Pro/) that uses OpenAI compatible API, it runs for a …

chigkim updated 2 weeks ago
2
open-compass/opencompass #1008

[Bug] Error when evaluate using LightLLM api

### 先决条件 - [X] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。 - [X] 错误在 [最新版本](https://github.com/open-com…

andakai updated 6 months ago
1
pan-x-c/EE-LLM #18

[QUESTION] Questions on the performance of EE-LLM on HELM be…

Hi, i'm trying to recreate figure 8 in the EE-LLM paper using the 7B checkpoint. Here are some of the problems i encountered during experiment. 1. HELM framework needs the tokenizer used by the mod…

Marmot-C updated 1 week ago
11
trigaten/The_Prompt_Report #177

General Comments

Below are the comments I have compiled from reading through the paper. Note that I have very limited knowledge of the subject so many of the conceptual concerns may be due to a lack of technical knowl…

MasonMarchetti updated 4 months ago
1
stanford-crfm/helm #2541

helm-summarize gives extra weight to runs in multiple groups

Suppose you have scenario `foo` with a parameter `param` and it has two runs like so: ``` {description: "foo:model=text,param=a", priority: 2} {description: "foo:model=text,param=b", groups: ["x"…

yifanmai updated 7 months ago
1
salesforce/xgen #17

Which evaluation infra was used for benchmarking?

A previous GH issue ([here](https://github.com/salesforce/xgen/issues/8)) mentions that a modified version of this script ([here](https://github.com/hendrycks/test/pull/13/files)) was used to collect …

woffett updated 1 year ago
1
EleutherAI/lm-evaluation-harness #2402

How to evaluate local model with local-completions?

I installed lm_eval on my laptop and would like to evaluate local model running on another server. Could anyone help me on how to run the command? Any parameters I set in wrong? Or any other informati…

liuzhuotao-teresa updated 2 weeks ago
3
stanford-crfm/helm #2467

Multiple choice joint adapter should not strip trailing whit…

This leads to prompts like: ``` Question: What is 1 + 2? A. 3 B. 4 Answer: ``` where there is no space after `Answer:` This causes most models to generate a space before the actual answer, …

yifanmai updated 1 month ago
2
Felixgithub2017/MMCU #2

如何处理多选题？

在使用MMCU的数据集的时候，发现有很多题是多选题，请问这种情况下是选对一个就算对，还是需要全选对？当前代码里使用 ``` if label in pred： ``` 来判断是否正确，会不会对多选题造成误判。参考HELM中对MMLU的处理，只需要选对一个即可。感谢！

dongZheX updated 1 year ago
4

上一页 1...13 14 15 16 17 18 19...100 下一页

1000+ results for mmlu

1000+ results
for mmlu