when MMLU eval, num_few_shot=5, more GPU overhead

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

https://www.eleuther.ai

MIT License

6.41k stars 1.69k forks source link

when MMLU eval, num_few_shot=5, more GPU overhead #1818

Closed chunniunai220ml closed 4 months ago

chunniunai220ml commented 4 months ago

when MMLU eval, same single GPU, it success: lm_eval --model hf \ --model_args pretrained="Qwen/Qwen1.5-7B-Chat",dtype=bfloat16 \ --tasks mmlu \ --device cuda \ --batch_size 32 \ --trust_remote_code \ --cache_requests true

but when --num_fewshot 5 , batchcsize masu <-=4, otherwise OOM:

it seems more GPU overhead, any suggestions for reduce GPU mem and accelerate?

haileyschoelkopf commented 4 months ago

This is expected because fewshot examples will result in larger model inputs. I'd recommend checking out vllm for faster inference!