artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
9.74k stars 800 forks source link

[Bug] large CUDA memory usage in the evaluation phase #284

Open ChenMnZ opened 6 months ago

ChenMnZ commented 6 months ago

I train llama-7b with the following batch size settings:

    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 4 \

When training, it consumes about 9G GPU memory. However, when evaluation (mmlu evaluation), the memory consumption increase to 27GB. It is there any bug for the evaluation process?

tianshu-zhu commented 5 months ago

set --eval_accumulation_steps