Open fredrikorn opened 3 years ago
If someone else runs into this issue, I found the nan-loss coming from the tf.sqrt gradient diverging close to zero (see this post ). I tackled this by adding a small epsilon value 1e-7 in dummy_loss in yolo.py.
Regarding the eager execution I haven't solved it
Hi! I've been using this repo on my own dataset and I have encountered the problem with the loss suddenly hitting nan, even though it was converging nicely before (as in #198 ) After printing some things in the tensorflow graph I'm quite sure the error comes from weird values on box width and height, but I haven't managed to pinpoint it.
To check it I thought I'd try running the program eagerly with
tf.compat.v1.enable_eager_execution()
but it results in the error'get_session' is not available when TensorFlow is executing eagerly.
Is it either possible to run it eagerly in some way or has anyone figured out the reason for the sudden nan-loss?