Closed blengerich closed 2 years ago
torch.rand generates parameters in the interval [0, 1). These are uncentered (so values may explode in lower layers), and the magnitude may be too large for stable training.
torch.rand generates parameters in the interval [0, 1). These are uncentered (so values may explode in lower layers), and the magnitude may be too large for stable training.