I'm training on my custom training set with 9k+ data points using the following hyperparameters on 3 GPUs:
--per_device_train_batch_size 3 \
--gradient_accumulation_steps 1 \
--max_steps 1600 \
Which should be 1600*3*3 = 14400 around 1.6 epoch when finished. However, the log only shows 'epoch': 0.52 at the end, looks like it does not take 3 GPUs into account. Is such information accurate?
I'm training on my custom training set with 9k+ data points using the following hyperparameters on 3 GPUs: --per_device_train_batch_size 3 \ --gradient_accumulation_steps 1 \ --max_steps 1600 \
Which should be 1600*3*3 = 14400 around 1.6 epoch when finished. However, the log only shows 'epoch': 0.52 at the end, looks like it does not take 3 GPUs into account. Is such information accurate?