Closed MathGaron closed 6 years ago
I don't get why the max_norm is set to 1. L139 torch.nn.utils.clip_grad_norm(self.model.parameters(), 1)
People usually use 1, (if it gets higher than that you have the exploding gradient problem!) mmh maybe for some unusual case people could need to clip it lower than 1.. but for now I guess that it should be fine?
Bump!?
Might be useful, especially for those of you who use LSTMs...