Closed N2606 closed 4 years ago
Hey, maybe too late, but I have observed a similar problem and it is because of weight decay. In the original implementation, there is no weight decay (which of course makes sense) and here I think the author forgot to not include it for the ARD specific implementations.
Thank you for your comment! Weight decay is removed
Hi , In my intuition, sigma should be smaller with training go through. However, I found the sigma keep increasing so the loss term would increase after some epoch. (below figure.) Is there some bugs in the code or it is just behave like that.