Closed 529261027 closed 2 years ago
Yes, it should be ((gp_loss - 1)^2) * lambda but for my implementation, I used gp_loss.
From the best of my understanding, gradient penalty loss is used for minimizing the problem of exploding and vanishing gradient. When I was training the model, I had the problem of exploding gradients due to a very high value of loss. To contain the value of loss, I used just gp_loss, because gp_loss was less than lambda*(gp_loss - 1)^2 and hence the magnitude of loss will be low. This solved the problem of exploding gradient to a certain extent. Let me know if you achieve stability by using the loss as in paper and we can update it with the same in repo.
I understood, thanks
hi, thanks for your code, when I read the code, I have a little question, when you calculate gp_loss. The original paper is written 1-lipschitz penalty should (gp_loss-1)^2, but your code is only gp_loss. I understand that you limit the loss to small and meet the requirements. I want to know, is the result obtained by the experiment? Looking forward to your reply.