Closed GuoQuanhao closed 3 years ago
From my understanding, it accumulate multiple times of gradient descent, so when your GPU can only afford 4 batchsize, you can simulate that it can afford 16(4*4). Of course, this trick will not make your training faster, it only may help it converge.
Oh, thanks, I think you are right. I think it's amazing, I have never seen this.