SciML / SciMLOperators.jl

SciMLOperators.jl: Matrix-Free Operators for the SciML Scientific Machine Learning Common Interface in Julia
https://docs.sciml.ai/SciMLOperators/stable
MIT License
42 stars 9 forks source link

need lazy adjoint fallback for `AbstractDiffEqLinearOperator` #12

Closed vpuri3 closed 2 years ago

vpuri3 commented 2 years ago
Base.adjoint(A::AbstractDiffEqLinearOperator) = Adjoint(A)

and write interface for Adjoint(::AbstractDiffEqLinearOperator)

danielwe commented 2 years ago

Highly recommend taking a look at LinearMaps.jl's handling of adjoints and transposes: https://julialinearalgebra.github.io/LinearMaps.jl/stable/generated/custom/#Adjoints-and-transposes. (It's a very well-designed library, just lacking the updating facilities needed in SciML.)

The bottom line is that Adjoint <: AbstractMatrix, which I don't think is what you want. Instead, you probably need your own SciMLAdjointOperator <: AbstractSciMLLinearOperator type.

And don't forget transposes :upside_down_face:

ChrisRackauckas commented 2 years ago

Highly recommend taking a look at LinearMaps.jl's handling of adjoints and transposes: https://julialinearalgebra.github.io/LinearMaps.jl/stable/generated/custom/#Adjoints-and-transposes. (It's a very well-designed library, just lacking the updating facilities needed in SciML.)

Agreed, if it didn't do its interface via subtyping I would say we should just be extending it, but its design makes it hard to extend while also making things have a subtype we can dispatch on. Specifically, we'd have to do a lot type pirating to do things like add calls to their types 😅 . So there's probably a lot that we want to copy from there, including the adjoint and transpose ideas.

vpuri3 commented 2 years ago

@danielwe is there any other functionality you'd like us to copy from LinearMaps?

danielwe commented 2 years ago

Thanks so much for asking! As far as my use cases are concerned, the important points are:

Just skimming the code briefly but it looks like it's all there :tada:

You also have a lazy tensor product, which is awesome! I'm not using that right now, but I would have given a lot to have it a couple of years ago, and it may come back around in the future. (Tensor sums, block operators, and block diagonal operators would also be nice down the road I suppose, but they're not urgent for me at least.)

A nice-to-have would be a variant of the in-place FunctionOperator that can take an accumulating "5-arg" function, i.e., a function with signature f!(y, x, α, β) representing y = α * L * x + β * y. This is not something LinearMaps has, but the more I think about it the more this seems like the right way to specify an in-place matrix-free linear operator.

Huge disclaimer: I don't think I'm a very typical user, so don't count on my use cases covering the most important functionality.


Some final thoughts/questions, having skimmed the code:

That was probably more code review than you bargained for, but the main point is I'm hugely appreciative of where this is going. The timing is also impeccable---I was in the middle of making my own pirated hodgepodge of operator packages to get the functionality I want, but this library will basically solve my problem.

vpuri3 commented 2 years ago

Thanks @danielwe. I tried fuse operations via * wherever possible. This would lead to faster evaluations ((A * B) * u as compared to A * (B * u) where A * B has already been evaluated). The user does always has the option to compose rather than fuse operators.

It is true we don't have consistent treatment of update_func when it comes to operator fusion. @ChrisRackauckas I'd need your input on that - could you take a look? Just search TODO around ScalarOperator and MatrixOperator definitions.

For the 5-arg mul! for FunctionOperator it should be a piece of cake. The signature should be op(du, u, p, t, \alpha, \beta) to stay consistent with the 3-arg mul!, op(du, u, p, t).

InvertibleOperator is an operator which has the method \ or ldiv! or both. I don't mind the name changes you recommended.

The tensor product is very useful to me. I'd like to also 3D and multidim tensor products \otimes(A, B, C) = \otimes(A, \otimes(B, C)), which is not possible rn. The reason is that the nested TensorProductOperator would have to act on a 2D array, and SciMLOperators only act on AbstractVectors. <-- this was a conscious decision on my part to only allow left-multiplication/division of SciMLOperators on AbstractVectors. Maybe we can get away with some reshape trick?

vpuri3 commented 2 years ago

Block matrices, and schur complement type stuff would be interesting to me too but it's not a necessity for me rn. LMK when that becomes imperative and we can finish it in a day or two

danielwe commented 2 years ago

This would lead to faster evaluations ((A * B) * u as compared to A * (B * u) where A * B has already been evaluated).

That's not generally true. Extreme example: if A and B are N x 1 and 1 x N, A * (B * u) is 2N multiplications, while (A * B) * u is N^2, even after precomputing A * B (which itself is N^2).

More generally I'd find it very surprising if A * B returns an AbstractSciMLOperator but doesn't respect the update functions of A and B. I think it would be a friendlier interface if one could trust that operations that map AbstractSciMLOperators to AbstractSciMLOperators are faithful to the interface, while eager methods return a plain array so that it's clear that they represent a frozen snapshot.

That way, it's totally fine if you decide that, for example, * should eager and ∘ should be lazy: just return a plain array in the former case.


InvertibleOperator is an operator which has the method \ or ldiv! or both

I think this statement clarifies the issue: InvertibleOperator sounds more like a trait than a type. Then again this is a type that just wraps anything that has the trait, so maybe that's the right name for it after all.

vpuri3 commented 2 years ago

i was thinking more of square matrices, but yes it does depend on your operator. the user should be able to choose bw eager and lazy, and i don't know how to preserve update behaviour with eager computation. i don't think we can.

What we can do is return an AbstractSciMLOperator with a no-update-function upon eager evaluation. @ChrisRackauckas ^ are we good with eager => snapshot, lazy => maintain update_function?

danielwe commented 2 years ago

i don't know how to preserve update behaviour with eager computation. i don't think we can

Nope, it's not possible. For simplicity just consider polynomials f(A) and g(B). In general, f(A)g(B) is a multivariate polynomial with monomials A^k B^l of all combinations of orders k and l, not just "diagonal" monomials (AB)^k, so you can't write f(A)g(B) as a function of AB.

vpuri3 commented 2 years ago

ok i see. so ill change Base.* to default to lazy. But we should be able to give the user ability to fuse operators when they know their computation may benefit and when update behavior is not necessary, for eg when one pass the operator to a krylov solver.

would be interesting to be able to do that in place

ChrisRackauckas commented 2 years ago

I think this statement clarifies the issue: InvertibleOperator sounds more like a trait than a type. Then again this is a type that just wraps anything that has the trait, so maybe that's the right name for it after all.

Agreed, though that's basically the has_ldiv!.

What we can do is return an AbstractSciMLOperator with a no-update-function upon eager evaluation. @ChrisRackauckas ^ are we good with eager => snapshot, lazy => maintain update_function?

No, that's the kind of behavior that leads to silent bugs.

More generally I'd find it very surprising if A * B returns an AbstractSciMLOperator but doesn't respect the update functions of A and B. I think it would be a friendlier interface if one could trust that operations that map AbstractSciMLOperators to AbstractSciMLOperators are faithful to the interface, while eager methods return a plain array so that it's clear that they represent a frozen snapshot.

Exactly. If someone says that their operator is A(t), then it better always update like A(t) or it's just computing the wrong answer. If A * B takes two operators and converts them into matrices which don't update, that would just be incorrect because then it would not have the actions that the user defined. I think it has to be lazy for that reason.

danielwe commented 2 years ago

for eg when one pass the operator to a krylov solver

Oh, I see where you're coming from now, you're thinking about the optimal way to "freeze" the operator at a particular timestep before passing it to an iterative solver. To me, that doesn't sound like something that's best accomplished bottom-up through Base.:* methods, but rather something that calls for a top-down freeze function that returns a single-use fused and frozen operator and leaves its argument unchanged.

However, it would be sad if implicit ODE solvers automatically made use of an overly eager freeze that would fuse structures for which the disaggregated matvecmul is orders of magnitude faster, as in the example above. Though, actually, it might not be so hard to come up with good heuristics for which nodes to fuse and which to leave unchanged; the computational complexity of a matvecmul is a straightforward function of the shapes of the arrays involved. Still, for my current use cases, I'd be happier if I could simply know that the operator passed through to the iterative solver is identical to the one I instantiate. (I put a lot of thought into that structure in the first place.)

One last thought, if you decide that some binary operations should absolutely fuse, I suppose the consistent choice would be to only accept constant operators and have them throw an error if given operators with nontrivial update functions (with a helpful error message pointing to the equivalent lazy operation). Another alternative that I don't think is a good idea is to let a nominally eager operation act lazily as a fallback---that would make it too hard to reason about expressions involving operators.

ChrisRackauckas commented 2 years ago

An operator is frozen as long as the solver does not call update_coefficients, so it's easy to freeze without doing any of this. Again, it's just unsafe behavior and there should be no way that an operator silently changes to something without its update_coefficients being correct, that's just wrong behavior.

I'd be happier if I could simply know that the operator passed through to the iterative solver is identical to the one I instantiate. (I put a lot of thought into that structure in the first place.)

That should always be the case.

danielwe commented 2 years ago

An operator is frozen as long as the solver does not call update_coefficients, so it's easy to freeze without doing any of this.

Just to clarify: the idea was that Afrozen = freeze(A) would perform any fusings/transformations that are deemed beneficial for performance but break updating. The point of calling the function would be to perform these transformations, not the "freezing" itself, which as you say is trivial. Nothing is silently changed---A is not touched, it's freeze, not freeze!.

Maybe instead of trying to devise intelligent heuristics for this, each composite operator could carry a flag fuse_on_freeze defaulting to false. To set it to true you'd have to use the constructor, e.g., replace A * B with ComposedOperator((A, B); fuse_on_freeze=true). The behavior of freeze would be fully determined by these flags.

This isn't important to me and I won't keep beating this drum, I just thought @vpuri3 had a valid concern about passing less-than-optimal operators to iterative solvers and that there may be ways to mitigate it without causing trouble.

ChrisRackauckas commented 2 years ago

If someone split there operator, I think we should assume it was on purpose. Maybe down the line we could do Matrix Chain Multiply optimizations to MatrixOperator when they are constant, but that's a super special case. I wouldn't worry about that for now. Open an issue, call it low priority, and I think we should go safety first and wait for someone to actually have that case before optimizing it (which will likely be never)