Intuition for self.iter_size (or accumulate gradients)

I have skimmed through the papers however didn't find the detailed explanation on accumulate gradients. Please help me understand. Generally simplified flow is like

predicted_output = model(input) loss = loss_function(predicted_output, ground_truth) optimizer.zero_grad() loss.backward() optimizer.step()

However in code, gradients are accumulated for 10 iterations and then reset. I am wondering what +ve or -ve impacts it will have if I

1: reset on each iteration means along the lines of above general algorithm flow 2: increase/decrease the self.iter_size 3: add support for multi-batching and multi-gpu

Many thanks.

Res2Net / Res2Net-PoolNet

Intuition for self.iter_size (or accumulate gradients) #4