Closed Bluefog-Lib closed 4 years ago
Before we trigger the communication after the backward gradient computation is finished in each layer. Actually we can put it at the end of forward computation:
w_{i+1} = neighborallreduce(w{i}) - lr * gradient(w_{i})
Before we trigger the communication after the backward gradient computation is finished in each layer. Actually we can put it at the end of forward computation:
w_{i+1} = neighborallreduce(w{i}) - lr * gradient(w_{i})