Questions about the architecture

MichaelZhouwang commented 5 years ago

It seems that during the joint update process, the generator update is just taking the gradient of its score, i.e., maximizing the score from discriminator. Whereas the discriminator is actually receiving false signal because it should not just take generated samples as real sample. Personally I think the model works like a standard GAN with G-step = 1 and D-step = 1 but discriminator receives false loss, which makes the performance worse. From your code I see the weight of gen_loss is low, I think this works because it degrades the performance from standard GAN...

HighCWu commented 5 years ago

@MichaelZhouwang In fact, I have noticed this. However, you may not see the code in my self_gan_keras_tpu.ipynb , where the real data is added to account for the loss ratio of the three, and this method does not need to consider the gen_loss ratio.

gen_loss = adversarial_loss([validity_gen, valid])
real_loss = adversarial_loss([validity_real, valid])
fake_loss = adversarial_loss([validity_fake, fake])
gen_loss = Lambda(lambda x: x*1.0, name='gen_loss')(gen_loss)
real_loss = Lambda(lambda x: x*1.0, name='real_loss')(real_loss)
fake_loss = Lambda(lambda x: x*1.0, name='fake_loss')(fake_loss)
v_g = Lambda(lambda x: 1 - K.mean(x))(validity_gen)
v_r = Lambda(lambda x: 1 - K.mean(x))(validity_real)
v_f = Lambda(lambda x: K.mean(x))(validity_fake)
v_sum = Lambda(lambda x: x[0]+x[1]+x[2])([v_g,v_r,v_f])
s_loss = Lambda(lambda x: x[2]*x[1]/x[0] \
                      + x[4]*x[3]/x[0] \
                      + x[6]*x[5]/x[0])([v_sum, v_r, real_loss, v_g, gen_loss, v_f, fake_loss])

When the discriminator's ability to determine the authenticity of all input data as "1" prevails, the new percentage will cause the discriminator to gradually begin to distinguish the currently generated fake data as false. When the discriminator's ability to determine the authenticity of all input data as "0" prevails, the new percentage will cause the discriminator to begin to distinguish the currently generated false data as true. This will gradually reach dynamic balance. Later I tested this program on pytorch, and it did work, but I was too lazy to update the code in my repo. In addition, I have made a rough estimate. My method is similar to the step size required for normal GAN convergence, and sometimes even faster. I have not made a detailed comparison because I don't have enough time. There are still some problems I may not be able to answer you because my level is not high enough. I just tried some of my whimsy.

MichaelZhouwang commented 5 years ago

@MichaelZhouwang In fact, I have noticed this. However, you may not see the code in my self_gan_keras_tpu.ipynb , where the real data is added to account for the loss ratio of the three, and this method does not need to consider the gen_loss ratio.
gen_loss = adversarial_loss([validity_gen, valid])
real_loss = adversarial_loss([validity_real, valid])
fake_loss = adversarial_loss([validity_fake, fake])
gen_loss = Lambda(lambda x: x*1.0, name='gen_loss')(gen_loss)
real_loss = Lambda(lambda x: x*1.0, name='real_loss')(real_loss)
fake_loss = Lambda(lambda x: x*1.0, name='fake_loss')(fake_loss)
v_g = Lambda(lambda x: 1 - K.mean(x))(validity_gen)
v_r = Lambda(lambda x: 1 - K.mean(x))(validity_real)
v_f = Lambda(lambda x: K.mean(x))(validity_fake)
v_sum = Lambda(lambda x: x[0]+x[1]+x[2])([v_g,v_r,v_f])
s_loss = Lambda(lambda x: x[2]*x[1]/x[0] \
                      + x[4]*x[3]/x[0] \
                      + x[6]*x[5]/x[0])([v_sum, v_r, real_loss, v_g, gen_loss, v_f, fake_loss])
When the discriminator's ability to determine the authenticity of all input data as "1" prevails, the new percentage will cause the discriminator to gradually begin to distinguish the currently generated fake data as false. When the discriminator's ability to determine the authenticity of all input data as "0" prevails, the new percentage will cause the discriminator to begin to distinguish the currently generated false data as true. This will gradually reach dynamic balance. Later I tested this program on pytorch, and it did work, but I was too lazy to update the code in my repo. In addition, I have made a rough estimate. My method is similar to the step size required for normal GAN convergence, and sometimes even faster. I have not made a detailed comparison because I don't have enough time. There are still some problems I may not be able to answer you because my level is not high enough. I just tried some of my whimsy.

Thanks. I think it kinds of make sense~

HighCWu / SelfGAN

Questions about the architecture #1