Currently the calculation for num_update_steps_per_epoch is incorrect when args.gradient_accumulation_steps > 0, resulting in the calculated num_train_epochs being too large by a factor of gradient_accumulation_steps. Because gradient accumulation is handled inside accumulator the fix is to just ignore gradient accumulation and count steps normally, aside from rounding down the number of steps per epoch to the accumulation step count.
Currently the calculation for
num_update_steps_per_epoch
is incorrect whenargs.gradient_accumulation_steps
> 0, resulting in the calculatednum_train_epochs
being too large by a factor ofgradient_accumulation_steps
. Because gradient accumulation is handled insideaccumulator
the fix is to just ignore gradient accumulation and count steps normally, aside from rounding down the number of steps per epoch to the accumulation step count.