Jutho / KrylovKit.jl

Krylov methods for linear problems, eigenvalues, singular values and matrix functions
Other
284 stars 37 forks source link

Ad extension [WIP] #85

Closed Jutho closed 4 months ago

codecov[bot] commented 5 months ago

Codecov Report

Attention: Patch coverage is 91.84290% with 54 lines in your changes missing coverage. Please review.

Project coverage is 84.64%. Comparing base (da91706) to head (f32f4f0). Report is 1 commits behind head on master.

:exclamation: Current head f32f4f0 differs from pull request most recent head b60adc5

Please upload reports for the commit b60adc5 to get more accurate results.

Files Patch % Lines
ext/KrylovKitChainRulesCoreExt/eigsolve.jl 92.16% 21 Missing :warning:
ext/KrylovKitChainRulesCoreExt/svdsolve.jl 92.62% 16 Missing :warning:
ext/KrylovKitChainRulesCoreExt/linsolve.jl 89.79% 5 Missing :warning:
src/eigsolve/arnoldi.jl 82.75% 5 Missing :warning:
src/linsolve/linsolve.jl 58.33% 5 Missing :warning:
src/eigsolve/eigsolve.jl 85.71% 1 Missing :warning:
src/factorizations/lanczos.jl 91.66% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #85 +/- ## ========================================== + Coverage 82.05% 84.64% +2.59% ========================================== Files 27 31 +4 Lines 2753 3322 +569 ========================================== + Hits 2259 2812 +553 - Misses 494 510 +16 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jutho commented 4 months ago

I think this is now mostly ready, up to some cleanup and streamlining of the interface. Maybe @lkdvos wants to review?

Jutho commented 4 months ago

There is one more significant TODO:

The eigenvalue approach for solving the linear problem/Sylvester problem in both the rrule of eigsolve and svdsolve is nonhermitian, which means that the results are always obtained in complex arithmetic, even when the forward calculation can be completely real (namely for a real symmetric eigenvalue problem or a real singular value problem). This so far does not cause problems, as apparently the imaginary parts of the computed quantities is exactly zero and therefore it is implicitly converted back to real vectors. However, this might break with custom types, so it would be better to explicitly restrict to real arithmetic by using schursolve and a custom routine to extract the real eigenvectors associated with the results coming out of schursolve.

lkdvos commented 4 months ago

There is one more significant TODO:

The eigenvalue approach for solving the linear problem/Sylvester problem in both the rrule of eigsolve and svdsolve is nonhermitian, which means that the results are always obtained in complex arithmetic, even when the forward calculation can be completely real (namely for a real symmetric eigenvalue problem or a real singular value problem). This so far does not cause problems, as apparently the imaginary parts of the computed quantities is exactly zero and therefore it is implicitly converted back to real vectors. However, this might break with custom types, so it would be better to explicitly restrict to real arithmetic by using schursolve and a custom routine to extract the real eigenvectors associated with the results coming out of schursolve.

I haven't looked into this in detail, but this sounds like it could also be solved with appropriate calls to ProjectTo, which should work even for custom types as there it gives a hook to add the correct projection methods.

Jutho commented 4 months ago

Ok, I think this is mostly ready. Maybe I can add a few more tests to improve coverage (e.g. print warnings and test for them). The TODO for not having to go via complex values in case of Arnoldi rrule for svdsolve or hermitian eigsolve is also still open.

About the interface; I did not really study the comment of @lkdvos above. What do you dislike about the keyword argument ; alg_rrule = ... in the methods? Whether the actual values to this keyword need to have the values they currently have (recycling existing structures) or some new values is certainly open for debate. And a sensible default definitely needs to be in place after benchmarking. This also reminds me, documentation about this needs to be added.

lkdvos commented 4 months ago

I think my main argument against the alg_rrule kwarg is that it "pollutes" the method definition with information that is only relevant to AD. In other words, if I were to for example write an implementation of minres, I now need to remember to add that keyword argument, even though this has nothing to do with the primal computation.

I would much rather have a different implementation that keeps the primal computations clean. One such way is to simply add a wrapper function with the alg_rrule kwarg added, which can then correctly distribute the args and kwargs to their relevant places.

I am a bit more in favour of something like:

vals, vecs, info = eigsolve(A, x, num, which, alg) # can infer default AD algorithm

# option 1:
vals, vecs, info = hook_pullback(eigsolve, A, x, num, which, alg; alg_rrule) # expert mode -- specifies rrule algorithm

# option 2:
@alg_rrule vals, vecs, info = eigsolve(A, x, num, which, alg; alg_rrule) # looks like current implementation, but expands to option 1

This being said, in principle this is mostly a conceptual issue, and this does not really change all that much. I like keeping AD separated, but this is obviously strictly necessary.


Concerning the test coverage, it might be a good idea to add a sparse matrix to the set of tests. This seems like a good candidate for something that checks if our assumptions about what we can/cannot do with AbstractMatrix types are fair

Jutho commented 4 months ago

Ok I see; I agree that it ads a keyword that is irrelevant to the forward computation. As that is mostly a "burden" on the developer side, I don't mind too much. If there would be some centralised infrastructure to do the hook or macro solution, I would use it, but for now, I think there is less overhead in simply adding those keywords rather than developing it within the scope of this package.

From the user side, I think the current approach is fine right? If they don't need AD, they don't need to care about this keyword, and if they do, it is easy to try out the different choices without any significant change to the code.

lkdvos commented 4 months ago

yes, I think so

Jutho commented 4 months ago

Ok, I think this is more or less ready. I started modifying the tests by using @test_logs to test verbosity, but only in certain parts of the tests so far (because it conflicts with @constinferred and because I don't know how to test for a variable number of @info outputs). Making the tests more uniform can happen later.

The other remaining questions are interface, i.e. the keyword argument rrule_alg, and also the fact that the warning for the gauge-dependence of the adjoint is only printed for verbosity >= 1 (which then also implies that other @info statements are printed). Hence, there is currently no way to have the warning (which I think is important and relevant) without having additional @info output. However, if the warning is on by default (i.e. also for verbosity==0), then the tests would produce plenty of those, as computing the full Jacobian with respect to all variables in the eigenvectors is not a gauge-invariant operation.

Jutho commented 4 months ago

Thanks for the logging solution; that is indeed useful. However, another reason why I decided to hide the warning behind the verbosity >= 1 flag is that I also want to make it easy to switch it off for users that know what they are doing.

Another solution could thus be to reduce it to verbosity >= 0, so that the gauge dependence warning is on by default, but can be switched off by setting verbosity=-1. At some point we should also switch to using the levels of the Logging system, but I do not want to invest the time in this right now.

lkdvos commented 4 months ago

:partying_face: