nan gradient for ComplexDense with angle-based loss

NEGU93 / cvnn

Library to help implement a complex-valued neural network (cvnn) using tensorflow as back-end

https://complex-valued-neural-networks.readthedocs.io/

MIT License

164 stars 34 forks source link

nan gradient for ComplexDense with angle-based loss #12

Open jonasdaugalas opened 3 years ago

jonasdaugalas commented 3 years ago

@NEGU93 Thanks for this great piece of work!

I am having an issue with angle-based loss functions. See here a small example: https://colab.research.google.com/drive/10y2eBxHMq5HCbHqOsrfKzvrJ7AV_RegZ?usp=sharing

instantiate a model with a single complex dense layer with one unit and no activation;
make an input tensor with a single "zero" sample;
define a reconstruction loss comparing angles of input vs output;
pass the input through the model, get 0 loss, and nan gradients.

The gradients look OK (not nan) when the input is not zero. Maybe I am doing something wrong?

Thanks.

NEGU93 commented 3 years ago

Dear @jonasdaugalas ,

Thank you for your interest in my library. What you are experiencing is indeed very puzzling.

I tried many things to know what was happening. One interesting thing to note is that when changing tf.math.angle of your loss to, for example, tf.math.abs the error stops. This seems to point out it's actually an incompatibility between the tf angle function and the gradient. Would it be a tf bug?

Here the changes I did.

NEGU93 commented 3 years ago

So now I replicated your problem but using solely tensorflow's library in here. I posted it here to see what happens.

NEGU93 commented 3 years ago

So I don't fully understand it yet but I have that:

The problem is only with 0+i0 values and no other from the complex domain.
To cite a response in Tensorflow forum:

What I can gather is that nan numpy values for gradients are acceptable when those gradients are differentials of zero values, and this shouldn't cause an issue when actual complex numbers are inserted.

Although I do not yet understand why.

NEGU93 commented 3 years ago

I will leave this issue open but according to the discussion on tensorflow I believe it's an issue the have with tf.math.angle and the gradients. I encourage you to participate on the discussion or create a new issue. From my part I am still trying to understand the problem.

NEGU93 commented 2 years ago

This might be helpful to solve this issue.

ooshyun commented 1 year ago

I got a same problem in my model. Thanks for sharing this discussion.

However, how's about if when we get the complex number 0+0j, then add the smallest number in imaginray part inducing the differentiate to 0? As previous discussion, if complex number is a+bj, then the differentiate equation atan(b/a) goes to 0. This seems like avoiding log / divide calculation to inf. Or can I ask about solutions for avoiding the differentiate tf.math.angle from nan?