Closed joeking11829 closed 1 year ago
By the way, the "iters_per_epoch" is set to 1000, so during training, it only uses batch_size number of GPUs iters_per_epoch of the total training set. Each epoch, model will watch 32000 (4 8 1000) random samples. Is that correct ?
max_epoch: 3
iters_per_epoch: 1000
batch_size_train: 4
batch_size_eval: 4
num_workers: 4
Thanks !!
Q1: To reduce GPU memory consumption, you can try the following:
Reduce the batch size. Enable BF16 (bfloat16) for mixed-precision training. Shorten the maximum input sequence length for the language model.
Q2: No, the total number of samples you will see during training is calculated as 4 (batch_size) 8 1000 (iters_per_epoch) * 3 (max_epoch) = 96000 samples. The iters_per_epoch parameter is mainly used to determine how often a checkpoint is saved.
Hi guys,
Thanks for your great works.
When using the 'visionbranch_stage2_finetune.yaml' configuration to fine-tune 'VL_LLaMA_2_13B_Pretrained.pth' on an A100 80GB.
I found that the training program is out of GPU memory.
I only succeeded in starting the training program when I set the batch size to 3.
Do you have any suggestions ? Thanks !!