HolyBayes / pytorch_ard

Pytorch implementation of Variational Dropout Sparsifies Deep Neural Networks
MIT License
83 stars 16 forks source link

Sigma value increases when training with no KL term. #6

Closed N2606 closed 4 years ago

N2606 commented 4 years ago

Hi , In my intuition, sigma should be smaller with training go through. However, I found the sigma keep increasing so the loss term would increase after some epoch. (below figure.) Is there some bugs in the code or it is just behave like that. image image

martinferianc commented 4 years ago

Hey, maybe too late, but I have observed a similar problem and it is because of weight decay. In the original implementation, there is no weight decay (which of course makes sense) and here I think the author forgot to not include it for the ARD specific implementations.

HolyBayes commented 4 years ago

Thank you for your comment! Weight decay is removed