-
Hello,
I want to express my gratitude for your outstanding work. The powerful lm-evaluation-harness and your continuous maintenance have made LLM-evaluation much more convenient.
However, I hav…
-
Currently it seems that to run MMLU with the `lighteval` suite, one needs to specify all the subsets individually as is done for leaderboard task set [here](https://github.com/huggingface/lighteval/bl…
-
Hi guys!, It will be nice to add support to Octopus LLMs or are they any alternative? The MMLU score of Octopus v4 is 74.8% under 5-shot, very impressive for such a small model!
Octopus is based on P…
-
### 先决条件
- [X] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。
- [X] 错误在 [最新版本](https://github.com/open-com…
-
I am experiencing a memory leak while running my application, which is to run an MMLU accuracy test on my Radeon 780M iGPU via DirectML.
Each inference adds tens-hundreds of megabytes to the total …
-
We need to set the test script for our training pipeline.
- Data generation: @hungphongtrn
- [ ] Check the audio generated (audio match the prompt)
- [ ] Check the integrity of audio files wi…
-
When I run the code "CUDA_VISIBLE_DEVICES=3 TRANSFORMERS_OFFLINE=1 lm_eval --model hf --model_args pretrained=/public/MountData/yaolu/LLM_pretrained/LLAMA2_7B/,trust_remote_code=True --tasks mmlu,cm…
-
Hi, reading the [QLoRA paper](https://arxiv.org/pdf/2305.14314.pdf), you folks are reporting the results on MMLU test set in Table 5:
![image](https://github.com/artidoro/qlora/assets/44957968/cffd7c…
-
What are `mmlu_continuation` and `mmlu_generative`? Where can I find their description?
I am going to test `mmlu` in the `cloze` way. Like the following illustration:
![image](https://github.com/…
-
Hi, I'm trying to evaluate `gemma-it` models from Hugging Face on MMLU. When I set `--apply_chat_template --fewshot_as_multiturn`, the tokenizer will raise an error below. This is because Gemma does n…