Open pnmartinez opened 3 years ago
Thanks for the comments. As you mentioned from the paper of the scaled exponential linear units https://arxiv.org/abs/1706.02515, on page 6, they recommend not use dropout as the extra variance hinders the convergence of the algorithm when using normalization. We observed some convergence issues when exploring the hyperparameter space. Although with optimal model configurations, the training procedure was stable.
One thing to keep in mind is that the two best regularization techniques we found in our experiments are early stopping and second ensembling. Since ensembling boosts accuracy from the diversity and variance of models, the interaction of AlphaDropOut with the ensemble might be something interesting to explore. Still, we will try the AlphaDropOut regularization to test the SELU paper recommendation on this regression setting.
Hi,
My name is Pablo Navarro. Your team and I have already exchanged a few mails over the wonderful paper you've made. Thanks again for the contribution.
Now that the code is released, I have a couple question over the implementation of the SELU activation function.
Weight init
For SELU, you force
lecun_normal
which is in turn apass
on theinit_weights()
function:How come the weights are initialized as
lecun_normal
simply by passing? On my machine, default PyTorch initializes weights uniformly, not normally.DropOut on SELU
I believe that in order to make SELU useful, you need to use
AlphaDropout()
instead of regularDropOut()
layers (PyTorch docs).I can't find anything wrapping
AlphaDropOut()
in your code. Can you point me in the right direction or give the rationale behind it?Cheers and keep up the good work!