Gradient accumulation fix

Currently the calculation for num_update_steps_per_epoch is incorrect when args.gradient_accumulation_steps > 0, resulting in the calculated num_train_epochs being too large by a factor of gradient_accumulation_steps. Because gradient accumulation is handled inside accumulator the fix is to just ignore gradient accumulation and count steps normally, aside from rounding down the number of steps per epoch to the accumulation step count.

devilismyfriend / StableTuner

Gradient accumulation fix #60