Closed Tilps closed 3 years ago
The grads were unused, so forcing autograph to try and setup everything to compute each individual mean across the replicas was pointless and very expensive. Saves 80 seconds for first train step on RAF's training machine.
Grads I believe we used to include in tensorboard - but we dropped them as part of rationalizing our tensorboard data.
The grads were unused, so forcing autograph to try and setup everything to compute each individual mean across the replicas was pointless and very expensive. Saves 80 seconds for first train step on RAF's training machine.