Open ChenMnZ opened 6 months ago
I train llama-7b with the following batch size settings:
--per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 4 \
When training, it consumes about 9G GPU memory. However, when evaluation (mmlu evaluation), the memory consumption increase to 27GB. It is there any bug for the evaluation process?
set --eval_accumulation_steps
I train llama-7b with the following batch size settings:
When training, it consumes about 9G GPU memory. However, when evaluation (mmlu evaluation), the memory consumption increase to 27GB. It is there any bug for the evaluation process?