Open belemaire opened 1 year ago
I haven't run the code, but my guess is that the noise is wrong.
in train_step
real_labels = tf.ones_like(real_predictions)
real_noisy_labels = real_labels + 0.1 * tf.random.uniform(
tf.shape(real_predictions)) # it goes to [1.0, 1.1]
fake_labels = tf.zeros_like(fake_predictions)
fake_noisy_labels = fake_labels - 0.1 * tf.random.uniform(
tf.shape(fake_predictions)) # it goes to [-0.1, 0]
i think it should
real_labels = tf.ones_like(real_predictions)
real_noisy_labels = real_labels - 0.1 * tf.random.uniform(
tf.shape(real_predictions)) # it goes to [0.9, 1.0]
fake_labels = tf.zeros_like(fake_predictions)
fake_noisy_labels = fake_labels + 0.1 * tf.random.uniform(
tf.shape(fake_predictions)) # it goes to [0., 0.1]
if i was wrong, correct me
Hi,
When running the first notebook, as-is, from chapter 4, I am getting very odd results.
During training, I could notice very huge swings of accuracy/loss of the generator/discriminator, and after about 240 epochs all hell breaks loose, with the discriminator apparently starting to predict all images as fake ones (fake discrimator acc remains at 1, while real discrimator acc remains at 0).
Trying to wrap my head around what may be happening here. Given that I have not altered the code, the only difference would be the seeds for random noise used as input to the generator. But I doubt that this could cause such wide differences compared to the results presented in the chapter (the graphs presented in the chapter are quite smooth compared to my run).
It doesn't seem like its a case of the discriminator overpowering the generator, even though it seems to be the case on some prior epochs where the discrimator acc is peaking at 1 (or maybe it is a case of discriminator overpower, given that it predicts all images produced by the generator as fake ones). I don't really understand why the discriminator would suddenly drop to being "perfect" in term of accuracy (while it had about the same acc for real/fake during training) to stay at a perfect average after epoch 240 (nor while it swinged so widely during training or plateaued at 1 for some epochs, then suddenly dropping).
Could there be a "bug" that slipped into the code ? Do you have any intuition as to what may have gone wrong ? And more generally, as a novice practitioner, what should be the thought process here for troubleshooting such a training gone bad ?
Thanks !