Open carterbox opened 3 years ago
According to the manuscript, the least-squares grad common gradients denominator should be the norm across all batches not just the current batch. This is probably implemented incorrectly and needs to be fixed.
According to the manuscript, the least-squares grad common gradients denominator should be the norm across all batches not just the current batch. This is probably implemented incorrectly and needs to be fixed.