mmlu Search Results - Githubissues

1000+ results
for mmlu

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

stanford-crfm/helm #1335

Published MMLU results only cover 5/57 subjects?

[The MMLU results](https://crfm.stanford.edu/helm/v0.2.0/?group=mmlu) page appears to only cover 5/57 subjects. This does not match what I understand to be the [core benchmarking run_specs.conf](ht…

tgunter updated 9 months ago
3
tjunlp-lab/M3KE #1

Can you provide some details of the evaluation code for the …

This assessment data also appears to be in the form of multiple choice tasks similar to MMLU, but there are many detailed differences in the practice of MMLU, and these detailed differences have a sig…

nomadlx updated 1 year ago
1
horseee/LLM-Pruner #81

Post training more than 1 epoch leads to performance degrada…

I'm running post-training on a pruning model. After post-training, I get degraded performance - eg. mmlu goes down to 24%. is this expected? ``` MODEL=meta-llama/Llama-2-7b-hf prune_ckpt_path=…

sidhantls updated 1 month ago
1
EleutherAI/lm-evaluation-harness #2248

Allow Prefilling Assistant Response w/ Chat Templates

Models that are open-source and/or used via `local-completions`, as well as Claude, allow one to "prefill" the start of the assistant's response to a given input: https://docs.anthropic.com/en/docs/bu…

haileyschoelkopf updated 2 months ago
1
NVIDIA/TensorRT-LLM #967

llama2-7b bad results for int8-kv-cache + per-channel-int8-w…

### System Info 3090 gpu 0.7.1 tensorrt-llm ### Who can help? _No response_ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially suppo…

brisker updated 8 months ago
32
declare-lab/instruct-eval #29

Evaluate on a single 24GB/32GB GPU

Hi, on a single 4090 GPU with 24GB memory, the following command will cause out-of-memory. ```bash python main.py mmlu --model_name llama --model_path huggyllama/llama-7b ``` After that, I try…

lemyx updated 9 months ago
1
EleutherAI/lm-evaluation-harness #2145

[Question] A way to run multiple evals on multiple models?

Hi, I was wondering if there is a way supported by Python API to run multiple eval benchmarks on multiple models (by passing in a list of models and their respective arguments in the `model_args` argu…

tanaymeh updated 2 months ago
1
EvolvingLMMs-Lab/lmms-eval #307

[Language Tasks]Could i use llava_hf.py and Llava(Multimodal…

Could i use llava_hf.py and Llava(Multimodal model) to run language tasks? I tried gsm8k and ifeval,both of them have mistakes. For example, for gsm8k, it says that the "sampler" for fewshot was not …

Violettttee updated 4 weeks ago
1
open-compass/opencompass #579

[Bug] ceval, cmmlu, mmlu 的 gen 对话模板行为不一致，mmlu 的对话模板存在问题

### 先决条件 - [X] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。 - [X] 错误在 [最新版本](https://github.com/open-…

LiuLinyun updated 5 months ago
11
microsoft/promptbase #21

Missing mmlu as valid_dataset even though readme encourages …

(gpt5) [jfreeman@mir-81 promptbase]$ python -m promptbase mmlu usage: __main__.py [-h] [--subject SUBJECT] [--list_subjects] {gsm8k,humaneval,math,drop,bigbench} __main__.py: error: argument dataset…

jackfreeman88 updated 10 months ago
1

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for mmlu

1000+ results
for mmlu