I found the TensorFlow implementation causes NaN when I'm using it to train my deep CNN. This turns out to be due to the tf.sqrt(conv). Replacing it with return tf.sqrt(tf.maximum(conv, 1e-5)) fixes the problem! I saw the pytorch version does (out + 1e-12).sqrt(), so maybe that would be the proper way?
I found the TensorFlow implementation causes NaN when I'm using it to train my deep CNN. This turns out to be due to the
tf.sqrt(conv)
. Replacing it withreturn tf.sqrt(tf.maximum(conv, 1e-5))
fixes the problem! I saw the pytorch version does(out + 1e-12).sqrt()
, so maybe that would be the proper way?