-
# Steps to reproduce
```bash
$ cd lm-eval-harness
$ pip install -e .[vllm]
$ mkdir hellaswag
$ lm-eval --tasks hellaswag --model vllm --model_args pretrained=deepseek-ai/deepseek-coder-1.3b-instr…
-
### What is the issue?
When I run the MMLU Pro benchmark on phi3 or deepseek-coder-v2 with [this script](https://github.com/chigkim/Ollama-MMLU-Pro/) that uses OpenAI compatible API, it runs for a …
-
### 先决条件
- [X] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。
- [X] 错误在 [最新版本](https://github.com/open-com…
-
Hi, i'm trying to recreate figure 8 in the EE-LLM paper using the 7B checkpoint. Here are some of the problems i encountered during experiment.
1. HELM framework needs the tokenizer used by the mod…
-
Below are the comments I have compiled from reading through the paper. Note that I have very limited knowledge of the subject so many of the conceptual concerns may be due to a lack of technical knowl…
-
Suppose you have scenario `foo` with a parameter `param` and it has two runs like so:
```
{description: "foo:model=text,param=a", priority: 2}
{description: "foo:model=text,param=b", groups: ["x"…
-
A previous GH issue ([here](https://github.com/salesforce/xgen/issues/8)) mentions that a modified version of this script ([here](https://github.com/hendrycks/test/pull/13/files)) was used to collect …
-
I installed lm_eval on my laptop and would like to evaluate local model running on another server. Could anyone help me on how to run the command? Any parameters I set in wrong? Or any other informati…
-
This leads to prompts like:
```
Question: What is 1 + 2?
A. 3
B. 4
Answer:
```
where there is no space after `Answer:` This causes most models to generate a space before the actual answer, …
-
在使用MMCU的数据集的时候,发现有很多题是多选题,请问这种情况下是选对一个就算对,还是需要全选对?
当前代码里使用
```
if label in pred:
```
来判断是否正确,会不会对多选题造成误判。
参考HELM中对MMLU的处理,只需要选对一个即可。
感谢!