aesara-devs / aesara

Aesara is a Python library for defining, optimizing, and efficiently evaluating mathematical expressions involving multi-dimensional arrays.
https://aesara.readthedocs.io
Other
1.18k stars 155 forks source link

Implement a gradient for `Eig` #1096

Open brandonwillard opened 2 years ago

brandonwillard commented 2 years ago

We still don't have a gradient implementation for Eig (and a few other aesara.linalg Ops like https://github.com/aesara-devs/aesara/issues/836). Here's an example implementation that could work.

andrejmuhic commented 1 year ago

I just wanted to add that the Schur form is numerically computed using QR step algorithm (I prefer this name over QR algorithm ) which is somewhat related to the power iteration, the subspace flavour version. This is efficiently done with shifts and implicit bulge chasing, usually out of scope for the implementation from scratch, at least it does not make sense to do so as lapack version is available and it is hard to do better than that. The additional complication is if one wants the real Schur form, the upper quasi-triangular matrix with 1-by-1 and 2-by-2 blocks, where 2-by-2 blocks correspond to conjugate complex eigenvalue pairs. The C++ code implementation that does this directly from the decompositions and their properties in projective gradient like way is available for example in: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/FunctionsManual.cpp The other option would be to backprop through the steps of algorithm directly but that requires numerically stable reimplementation with autodiff support and also ensuring that the gradient is numerically stable, this sounds hard? I think the Schur decomposition is not there but following the documented code and resources in comments it should be possible to produce formula for the eigenvalues gradient if one really desired the more stable formula.

andrejmuhic commented 1 year ago

Another issue is that you are at mercy of the algorithm that will usually sort the eigenvalues. So even if the eigenvalue gradient (sensitivity) does not depend on the distance from the closest eigenvalue directly as it is in the case for the gradient (sensitivity) of the eigenvectors if they exist. There is still the problem that when two eigencurves intersect they will be sorted and you lose the track which eigenvalue is which. Like $\lambda_1 = t$ and $\lambda2 = \exp(t^2) - 1$ and then suddenly when crossing t =0 it will be $\lambda{1'} = \exp(t^2) - 1$ and $\lambda_{2'} = t.$ This could be a problem or not, I guess the user needs to be aware of this.