-
[The MMLU results](https://crfm.stanford.edu/helm/v0.2.0/?group=mmlu) page appears to only cover 5/57 subjects.
This does not match what I understand to be the [core benchmarking run_specs.conf](ht…
-
This assessment data also appears to be in the form of multiple choice tasks similar to MMLU, but there are many detailed differences in the practice of MMLU, and these detailed differences have a sig…
-
I'm running post-training on a pruning model. After post-training, I get degraded performance - eg. mmlu goes down to 24%. is this expected?
```
MODEL=meta-llama/Llama-2-7b-hf
prune_ckpt_path=…
-
Models that are open-source and/or used via `local-completions`, as well as Claude, allow one to "prefill" the start of the assistant's response to a given input: https://docs.anthropic.com/en/docs/bu…
-
### System Info
3090 gpu
0.7.1 tensorrt-llm
### Who can help?
_No response_
### Information
- [ ] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officially suppo…
-
Hi, on a single 4090 GPU with 24GB memory, the following command will cause out-of-memory.
```bash
python main.py mmlu --model_name llama --model_path huggyllama/llama-7b
```
After that, I try…
lemyx updated
9 months ago
-
Hi, I was wondering if there is a way supported by Python API to run multiple eval benchmarks on multiple models (by passing in a list of models and their respective arguments in the `model_args` argu…
-
Could i use llava_hf.py and Llava(Multimodal model) to run language tasks?
I tried gsm8k and ifeval,both of them have mistakes. For example, for gsm8k, it says that the "sampler" for fewshot was not …
-
### 先决条件
- [X] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。
- [X] 错误在 [最新版本](https://github.com/open-…
-
(gpt5) [jfreeman@mir-81 promptbase]$ python -m promptbase mmlu
usage: __main__.py [-h] [--subject SUBJECT] [--list_subjects] {gsm8k,humaneval,math,drop,bigbench}
__main__.py: error: argument dataset…