adavoudi / spdnet

Implementation of Deep SPDNet in pytorch
MIT License
58 stars 11 forks source link

UnTangentSpace behavior different from scipy.linalg.expm #5

Closed alekseic closed 5 years ago

alekseic commented 5 years ago

I performed some neural network training and produced a 50x50 SPD matrix called X which I compared to another SPD Y (ground truth), by putting both of them into Tangent space and computing a standard MSELoss between them. After some successful iterations during which the loss decreased, I decided to transform X back to "normal space" using untangent_space = SPDUnTangentSpace(unvectorize=False). I found that the result of untangent_space(X) is very different from the true Y. To check again, I then performed the back conversion by taking scipy.linalg.expm(X) instead of applying untangent_space, and then the result was very close to the true Y, as expected.

I looked at spd.py, and the forward function for SPDUnTangentSpace seems ok... nothing suspicious. Another interesting thing - when I applied scipy.linalg.logm to the result of untangent_space(X), I obtained a version of X in which all the signs were flipped, as if scipy.linalg.logm(untangent_space(X)) = -X. Very puzzling. As if it was a different leaf or something.

I cannot give you the exact matrices now for experimenting, but I'll try to come up with a more simple example (say, a 5x5 matrix). Maybe it has something to do with numerical stability of the SVD, or there's something in tangent_space in utils.py?.. I'll look to it, but meanwhile will refrain from using SPDUnTangentSpace.

UPD: Basically, X has nothing to do with the problem. It boils down to two observations:

untangent_space(tangent_space(Y)) != Y scipy.linalg.logm(untangent_space(tangent_space(Y)) = -scipy.linalg.logm(Y)

By the way, you are using the SVD of a matrix and its diagonal part for computing the log and the exp, while the original paper by Huang and Van Gool used the eigendecomposition. Is it equivalent?

alekseic commented 5 years ago

Ok. Here's what I found. Suppose we have an SPD matrix called img. Now, follow me.

Method 1: take the matrix logarithm through EIG (not SVD) w, h = np.linalg.eig(img) h @ np.diag(np.log(w)) @ h.transpose()

Method 2: take the matrix logarithm through SVD u2, s2, v2 = np.linalg.svd(img) u2 @ np.diag(np.log(s2)) @ u2.transpose()

These results are equivalent, and they are both equal (up to some rounding) to the result of scipy.linalg.logm(img). So nothing wrong with SPDTangentSpace here.

Now suppose that we took the logarithm with whatever method above, and its not stored in logimg. Let's try to go back to img by taking matrix exponent using several methods like above.

Method1: take the matrix exponent through EIG w, h = np.linalg.eig(logimg) h @ np.diag(np.exp(w)) @ h.transpose()
Note that I replaced the log operator by the exp operator.

Method 2: take the matrix exponent through SVD u2, s2, v2 = np.linalg.svd(logimg) u2 @ np.diag(np.exp(s2)) @ u2.transpose()

And here we find that the results differ! While the first way (with EIG) is equal to scipy.linalg.expm(logimg), the second one, which relies on SVD, gives a different (and incorrect) result!

This may be related to the fact that, although if A = A', singular values of A are equal to the absolute values of eigenvalues of A, exp(lambda) is not always equal exp(abs(lambda)).

So I suspect that if we want to continue using SVD in the code of SPDUnTangentSpace, some things should be changed...

adavoudi commented 5 years ago

I just tested untangent_space(tangent_space(Y)). The output is equal, of course with a tolerance of 1e-4, which is neglectable. To see that, please run python tests/test_modules.py.

alekseic commented 5 years ago

Thank you. The source of my confusion was the fact that in some papers people talk about EIG decomposition, while in practice they use SVD decomposition, which is not exactly the same thing. Even Huang and van Gool write about EIG, while their Github code in Matlab relies on SVD :) there are some minor differences in how forward and backprop are computed, it can be found in the paper by Ionescu et al. on matrix backprop.