Closed ChristopherBrix closed 2 months ago
Currently, we define a scaling factor of 4 for both sigmoid and softmax. That's probably not right, the internet mentions other values (which I don't have at hand right now)
This is correct, the scaling factor of 4 is only for the sigmoid.
A student asked about the weirdness of our glorot init this year, in fact
Currently, we define a scaling factor of 4 for both sigmoid and softmax. That's probably not right, the internet mentions other values (which I don't have at hand right now)