Slow evaluation for 7B model using A100 multi-gpu

cnut1648 commented 1 year ago

Hello, I want to evaluate some 7B models using multi-gpu on a cluster of tasks. Right now I use the master branch's latest commit and run the following commands:

MODELS=(
  'EleutherAI/gpt-j-6b'
  'yahma/llama-7b-hf'
  "gpt2-xl"
  "togethercomputer/RedPajama-INCITE-7B-Base"
  'facebook/opt-6.7b'
  "stabilityai/stablelm-base-alpha-7b"
)
for model in ${MODELS[@]}; do
    for shot in 0 1 5; do
        echo $model $shot
        python main.py \
            --model hf-causal-experimental \
            --model_args pretrained=$model,use_accelerate=True,dtype="bfloat16" \
            --tasks anli_r1,anli_r2,anli_r3,arc_challenge,arc_easy,boolq,cb,copa,headqa,hellaswag,multirc,record,rte,wic,wsc,lambada_openai,lambada_standard,logiqa,winogrande,sciq,openbookqa,piqa \
            --batch_size 64 \
            --device auto \
            --output_path results/$model/$shot.json \
            --num_fewshot $shot
    done
done

Basically it runs every model in MODELS with 0,1,5 shots on several tasks. Running gpt-j-6b 1 shot gives me 231k total examples to evaluate and takes about 8 hours to finish 0 shot on 8 A100-40G GPUs (each takes about 34G memory usage, around 8.7 instances per second). I wonder if this is an expected speed since it seems to be slow (e.g. running all might takes 8x3x6 = 144 hours). I tried bf16 (by dtype="bfloat16") but I am not yet sure if bf16 is used. I also made use_accelerate=True. Is there any way to speed up this process? Thank you!

StellaAthena commented 1 year ago

Yeah that sounds about right. You have a huge list of tasks, after all.

Are there other contexts where you get substantially more than 8.7 forward passes through a 6B model per second?

cnut1648 commented 1 year ago

Hi @StellaAthena thanks I am still new to the field so you probably is right that this is an expected speed. Just want to check if I enable all accelerating configurations. Thanks. I am closing this issue.

EleutherAI / lm-evaluation-harness

Slow evaluation for 7B model using A100 multi-gpu #604