Closed inchulnim123 closed 9 months ago
Hi, Are you saying this is only happening when you change from 1e-10 to 1e-8? Further, can you share the training scripts to me too if you changed something. I can take a look and let you know the problem.
loss is most probably jumping to Nan because somewhere something might be getting divided by zero. So can you also use "tf.debugging.enable_check_numerics()" for figuring that out.
Loss_Nan happend only in CelebA-Dataset when I trained D+1e-10, D+1e-8 respectively I just changed gpu_options, and gpu_ids -> '0,1', random_mask -> 1, random_mask_type -> irregular_mask, incremental_training -> 1 in train_options.py. batch_size also 1. In Facades dataset and paris streetview, It works really good, but only celebA, It didn't work well...
Can you please try using this: tf.debugging.enable_check_numerics()
Thanks, I'll try it. Can I check where the Nan is appearing using tf.debugging.enable_check_numerics() and ask again?
sure
Hi Gourav! I trained facades and CelebA using 2 RTX3060. it works good in facades Dataset . However in CelebA, I selected randomly 28000 images, and trained. Epoch 0~10, it works well, but after, Loss is jumping or Loss is nan. I used D_H = tf.multiply (tf.expand_dims (tf.math.pow (D + 1e-8, -0.5), axis=-1), H) B = tf.linalg.diag (tf.math.pow (B + 1e-8, -1))
I want to know why and how can i fix it?