-
I'm using lm-eval v0.4.2 to evaluate Llama 7b on the open llm leaderboard benchmark.
I found that there are accuracy gaps between single GPU and multiple GPUs as below. (I used data parallel)
| |…
-
Since https://github.com/vllm-project/vllm/pull/3065, the eval suite https://github.com/EleutherAI/lm-evaluation-harness is broken.
Repro (this should be run on 2 A100s or H100s to make sure the Mi…
-
I have a question regarding evaluating LLMs on mc questions using loglikelihood of tokens. From existing implementations like MMLU etc, the code snippet would look like this:
```
# Create the model
…
-
Thank you the team for the great work. I have a question. Can you please help me to use lighteval to evaluate a model on a single sample?
For example, if I have an input from mmlu I, my model gene…
-
### 先决条件
- [X] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。
- [X] 错误在 [最新版本](https://github.com/open-…
-
Hi. I realized that accelerate launch works perfectly when I set batch_size = "auto" but gets stuck at the very end when I use batch_size = "auto:2". The problem persists whether I use evaluator.simpl…
-
Hi folks, thanks for creating the dataset.
In your paper and the dataset card, you claim that MMLU-PRO has 10 choices for each question which seems to be false.
By opening the Viewer tab, and select…
-
**Describe the bug**
I did `ilab model train`, but `ilab model test` failed with
```
OTE: Adapter file does not exist. Testing behavior before training only. - /Users/ahmedazraq/Library/Application…
-
### 先决条件
- [X] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。
- [X] 错误在 [最新版本](https://github.com/open-com…
-
Hello,
I've been trying with different LLMs but I haven't been able to make it works. Could you bring some light?
```shell
luispoveda93@LUIS-PC:~/mlmm-evaluation$ bash scripts/run.sh es micro…