Closed Doch88 closed 3 years ago
@ouyangzhibo I have a theoretical question about the choice of the loss function for the generative part of the framework. Why did you use a standard minimax loss and not a Wasserstein Loss (like the GAIL in this paper)? Have you considered to use this loss to improve the training performance?
Thanks for the question! I actually have tried WGAN loss with GAIL, but I failed to make it converge for some reason. You could definitely try that. It may require some tuning of the hyper-parameters.
@ouyangzhibo I have a theoretical question about the choice of the loss function for the generative part of the framework. Why did you use a standard minimax loss and not a Wasserstein Loss (like the GAIL in this paper)? Have you considered to use this loss to improve the training performance?
Thanks for the question! I actually have tried WGAN loss with GAIL, but I failed to make it converge for some reason. You could definitely try that. It may require some tuning of the hyper-parameters.
Yeah, I noticed that using a WGAN-GP the losses explode and I don't know why. It seems that using a standard WGAN with weight clipping works well for now.
@ouyangzhibo I have a theoretical question about the choice of the loss function for the generative part of the framework. Why did you use a standard minimax loss and not a Wasserstein Loss (like the GAIL in this paper)? Have you considered to use this loss to improve the training performance?
Thanks for the question! I actually have tried WGAN loss with GAIL, but I failed to make it converge for some reason. You could definitely try that. It may require some tuning of the hyper-parameters.
Yeah, I noticed that using a WGAN-GP the losses explode and I don't know why. It seems that using a standard WGAN with weight clipping works well for now.
That's great! How does WGAN loss perform in the GAIL framework? Is it better than standard GAN loss?
@ouyangzhibo I have a theoretical question about the choice of the loss function for the generative part of the framework. Why did you use a standard minimax loss and not a Wasserstein Loss (like the GAIL in this paper)? Have you considered to use this loss to improve the training performance?
Thanks for the question! I actually have tried WGAN loss with GAIL, but I failed to make it converge for some reason. You could definitely try that. It may require some tuning of the hyper-parameters.
Yeah, I noticed that using a WGAN-GP the losses explode and I don't know why. It seems that using a standard WGAN with weight clipping works well for now.
That's great! How does WGAN loss perform in the GAIL framework? Is it better than standard GAN loss?
I'm working in a different domain with a different dataset so I haven't done yet a full training with COCO-Search18 (I'm doing it at the moment that I'm writing this comment) using the WGAN loss. However now, with three little changes, it seems to converge even with WGAN loss using gradient penalty. Here's what I have done:
grad = grad.view(mixup_states.size(0), -1)
By doing this, the norm will be done for each batch, and not for each dimension of the input states like before.
I also changed the mixed part of the gradient penalty calculation because originally it did not work.Like I said I haven't done a full training with COCO-Search18, but I've been training for some epochs (~15) and these are the validation results for now:
Compared with the one using the original loss it doesn't seem so much better for now, but I haven't yet tried to tune the parameters to obtain some better results. I'm using the same parameters from the JSON in the repository except for the lambda of gradient penalty, which I set to 100. With 10 or 5 the gradient explodes; with some other values maybe the results are better.
@ouyangzhibo I have a theoretical question about the choice of the loss function for the generative part of the framework. Why did you use a standard minimax loss and not a Wasserstein Loss (like the GAIL in this paper)? Have you considered to use this loss to improve the training performance?