Closed RahulBhalley closed 5 years ago
Answer 1: There is spectrum normalization can do 1-Lipschitz constraint. About question 2: I have the same question with you. I know there is a tech called TTUR, which argued that the # of G iterations / # of D iterations is 1:1 and the learning rate between G & D is 1:3. It's also different with this T-GAN implementations.
@rahulbhalley
question 1: @TLMichael 's answer is right.
question 2: Actually there are many tricks for adjusting learning rate or criteria. I don't evaluate all of them. 5:1 in WGAN is the choice in original paper of WGAN-GP as I known so I keep it. I also try 1:2 in WGAN but it seems work not well. And 1:2 (in SGAN and T-SGAN) is the common choice in SGAN as I known. I know my experiments are not very rigorous, but it simply demonstrates T-GANs will have a better result comparing to the corresponding GANs.
Actually, the goal of paper is to develop a more general framework for GAN in theory. Recently, I have some further development of it. It seems T-GANs can be developed as a method of graph network mapping. It seems like a adversarial version of Conditional Random Field. hope you will be interested in it.
Thank you @TLMichael I was unaware that SN can do 1-Lipschitz constraint.
@bojone have you tried setting 5:1 (D:G) iterations? If yes, how does those results compare to 1:2 setting with SN? One paper called CT-GAN also came out improving on WGAN-GP. Have you performed experiments with it? Does it perform better than WGAN with SN? If not, what are your views on it that will it perform better?
Observation: Except from change in loss terms from T-SGAN to T-WGAN for celebrity image generation at
128x128
scale I couldn't find any weight clipping in critic (discriminator) network of T-WGAN. Question: Doesn't it violate the 1-Lipschitz set of functions constraint on critic?Observation: Moreover the # of generator iterations / # of discriminator iterations is 2:1 in both T-SGAN and T-WGAN implementations. Question: But WGAN authors devised to train critic 5x times the iteration of generator. So why is this criteria different here?