bojone / T-GANs

Training Generative Adversarial Networks Via Turing Test
38 stars 7 forks source link

Critic is not implemented as 1-Lipschitz differentiable function #2

Closed RahulBhalley closed 5 years ago

RahulBhalley commented 5 years ago

Observation: Except from change in loss terms from T-SGAN to T-WGAN for celebrity image generation at 128x128 scale I couldn't find any weight clipping in critic (discriminator) network of T-WGAN. Question: Doesn't it violate the 1-Lipschitz set of functions constraint on critic?

Observation: Moreover the # of generator iterations / # of discriminator iterations is 2:1 in both T-SGAN and T-WGAN implementations. Question: But WGAN authors devised to train critic 5x times the iteration of generator. So why is this criteria different here?

TLMichael commented 5 years ago

Answer 1: There is spectrum normalization can do 1-Lipschitz constraint. About question 2: I have the same question with you. I know there is a tech called TTUR, which argued that the # of G iterations / # of D iterations is 1:1 and the learning rate between G & D is 1:3. It's also different with this T-GAN implementations.

bojone commented 5 years ago

@rahulbhalley

question 1: @TLMichael 's answer is right.

question 2: Actually there are many tricks for adjusting learning rate or criteria. I don't evaluate all of them. 5:1 in WGAN is the choice in original paper of WGAN-GP as I known so I keep it. I also try 1:2 in WGAN but it seems work not well. And 1:2 (in SGAN and T-SGAN) is the common choice in SGAN as I known. I know my experiments are not very rigorous, but it simply demonstrates T-GANs will have a better result comparing to the corresponding GANs.

Actually, the goal of paper is to develop a more general framework for GAN in theory. Recently, I have some further development of it. It seems T-GANs can be developed as a method of graph network mapping. It seems like a adversarial version of Conditional Random Field. hope you will be interested in it.

RahulBhalley commented 5 years ago

Thank you @TLMichael I was unaware that SN can do 1-Lipschitz constraint.

@bojone have you tried setting 5:1 (D:G) iterations? If yes, how does those results compare to 1:2 setting with SN? One paper called CT-GAN also came out improving on WGAN-GP. Have you performed experiments with it? Does it perform better than WGAN with SN? If not, what are your views on it that will it perform better?