Closed vpuri3 closed 2 years ago
Highly recommend taking a look at LinearMaps.jl's handling of adjoints and transposes: https://julialinearalgebra.github.io/LinearMaps.jl/stable/generated/custom/#Adjoints-and-transposes. (It's a very well-designed library, just lacking the updating facilities needed in SciML.)
The bottom line is that Adjoint <: AbstractMatrix
, which I don't think is what you want. Instead, you probably need your own SciMLAdjointOperator <: AbstractSciMLLinearOperator
type.
And don't forget transposes :upside_down_face:
Highly recommend taking a look at LinearMaps.jl's handling of adjoints and transposes: https://julialinearalgebra.github.io/LinearMaps.jl/stable/generated/custom/#Adjoints-and-transposes. (It's a very well-designed library, just lacking the updating facilities needed in SciML.)
Agreed, if it didn't do its interface via subtyping I would say we should just be extending it, but its design makes it hard to extend while also making things have a subtype we can dispatch on. Specifically, we'd have to do a lot type pirating to do things like add calls to their types 😅 . So there's probably a lot that we want to copy from there, including the adjoint and transpose ideas.
@danielwe is there any other functionality you'd like us to copy from LinearMaps?
Thanks so much for asking! As far as my use cases are concerned, the important points are:
mul!
Just skimming the code briefly but it looks like it's all there :tada:
You also have a lazy tensor product, which is awesome! I'm not using that right now, but I would have given a lot to have it a couple of years ago, and it may come back around in the future. (Tensor sums, block operators, and block diagonal operators would also be nice down the road I suppose, but they're not urgent for me at least.)
A nice-to-have would be a variant of the in-place FunctionOperator
that can take an accumulating "5-arg" function, i.e., a function with signature f!(y, x, α, β)
representing y = α * L * x + β * y
. This is not something LinearMaps has, but the more I think about it the more this seems like the right way to specify an in-place matrix-free linear operator.
Huge disclaimer: I don't think I'm a very typical user, so don't count on my use cases covering the most important functionality.
Some final thoughts/questions, having skimmed the code:
*(::MatrixOperator, ::MatrixOperator)
returns a MatrixOperator
rather than a ComposedOperator
. For one, I actually don't think it's possible to treat update_func
correctly this way---there's no way to fix this line, which incorrectly assumes that f(A) * g(B) == f(AB) * g(AB)
: https://github.com/SciML/SciMLOperators.jl/blob/53ebd2cf17446ae9ff6c691fabfcac7ef8c841b1/src/sciml.jl#L83 A consistent treatment of update_func
probably enforces laziness for every operation that returns an AbstractSciMLOperator
. Seems like the only eager operations should be ones that return an AbstractArray
.update_func
on FunctionOperator
. This could be a very useful addition: you could use a callable struct to represent a non-constant matrix-free operator. (One use case that comes to mind is JVP/VJP operators, although a better solution for those is probably dedicated {JVP,VPJ}Operator
types in a downstream library.)InvertibleOperator
should represent. Might FactorizedOperator
be a clearer name?AddedOperator
doesn't quite sit the way it should somehow, but I suppose the old CombinedOperator
lacks clarity. How about SummedOperator
? It seems quite universal to use the word "sum" about a reduction by addition.AdjointOperator
(even though it sticks out for being the only participle that doesn't end in "ed").That was probably more code review than you bargained for, but the main point is I'm hugely appreciative of where this is going. The timing is also impeccable---I was in the middle of making my own pirated hodgepodge of operator packages to get the functionality I want, but this library will basically solve my problem.
Thanks @danielwe. I tried fuse operations via *
wherever possible. This would lead to faster evaluations ((A * B) * u
as compared to A * (B * u)
where A * B
has already been evaluated). The user does always has the option to compose rather than fuse operators.
It is true we don't have consistent treatment of update_func
when it comes to operator fusion. @ChrisRackauckas I'd need your input on that - could you take a look? Just search TODO
around ScalarOperator
and MatrixOperator
definitions.
For the 5-arg mul!
for FunctionOperator it should be a piece of cake. The signature should be op(du, u, p, t, \alpha, \beta)
to stay consistent with the 3-arg mul!, op(du, u, p, t)
.
InvertibleOperator
is an operator which has the method \
or ldiv!
or both. I don't mind the name changes you recommended.
The tensor product is very useful to me. I'd like to also 3D and multidim tensor products \otimes(A, B, C) = \otimes(A, \otimes(B, C))
, which is not possible rn. The reason is that the nested TensorProductOperator
would have to act on a 2D array, and SciMLOperator
s only act on AbstractVectors
. <-- this was a conscious decision on my part to only allow left-multiplication/division of SciMLOperator
s on AbstractVector
s. Maybe we can get away with some reshape
trick?
Block matrices, and schur complement type stuff would be interesting to me too but it's not a necessity for me rn. LMK when that becomes imperative and we can finish it in a day or two
This would lead to faster evaluations (
(A * B) * u
as compared toA * (B * u)
whereA * B
has already been evaluated).
That's not generally true. Extreme example: if A
and B
are N x 1 and 1 x N, A * (B * u)
is 2N
multiplications, while (A * B) * u
is N^2
, even after precomputing A * B
(which itself is N^2
).
More generally I'd find it very surprising if A * B
returns an AbstractSciMLOperator
but doesn't respect the update functions of A
and B
. I think it would be a friendlier interface if one could trust that operations that map AbstractSciMLOperator
s to AbstractSciMLOperator
s are faithful to the interface, while eager methods return a plain array so that it's clear that they represent a frozen snapshot.
That way, it's totally fine if you decide that, for example, *
should eager and ∘
should be lazy: just return a plain array in the former case.
InvertibleOperator
is an operator which has the method\
orldiv!
or both
I think this statement clarifies the issue: InvertibleOperator
sounds more like a trait than a type. Then again this is a type that just wraps anything that has the trait, so maybe that's the right name for it after all.
i was thinking more of square matrices, but yes it does depend on your operator. the user should be able to choose bw eager and lazy, and i don't know how to preserve update behaviour with eager computation. i don't think we can.
What we can do is return an AbstractSciMLOperator with a no-update-function upon eager evaluation. @ChrisRackauckas ^ are we good with eager => snapshot, lazy => maintain update_function?
i don't know how to preserve update behaviour with eager computation. i don't think we can
Nope, it's not possible. For simplicity just consider polynomials f(A)
and g(B)
. In general, f(A)g(B)
is a multivariate polynomial with monomials A^k B^l
of all combinations of orders k
and l
, not just "diagonal" monomials (AB)^k
, so you can't write f(A)g(B)
as a function of AB
.
ok i see. so ill change Base.*
to default to lazy. But we should be able to give the user ability to fuse operators when they know their computation may benefit and when update behavior is not necessary, for eg when one pass the operator to a krylov solver.
would be interesting to be able to do that in place
I think this statement clarifies the issue: InvertibleOperator sounds more like a trait than a type. Then again this is a type that just wraps anything that has the trait, so maybe that's the right name for it after all.
Agreed, though that's basically the has_ldiv!
.
What we can do is return an AbstractSciMLOperator with a no-update-function upon eager evaluation. @ChrisRackauckas ^ are we good with eager => snapshot, lazy => maintain update_function?
No, that's the kind of behavior that leads to silent bugs.
More generally I'd find it very surprising if A * B returns an AbstractSciMLOperator but doesn't respect the update functions of A and B. I think it would be a friendlier interface if one could trust that operations that map AbstractSciMLOperators to AbstractSciMLOperators are faithful to the interface, while eager methods return a plain array so that it's clear that they represent a frozen snapshot.
Exactly. If someone says that their operator is A(t)
, then it better always update like A(t)
or it's just computing the wrong answer. If A * B
takes two operators and converts them into matrices which don't update, that would just be incorrect because then it would not have the actions that the user defined. I think it has to be lazy for that reason.
for eg when one pass the operator to a krylov solver
Oh, I see where you're coming from now, you're thinking about the optimal way to "freeze" the operator at a particular timestep before passing it to an iterative solver. To me, that doesn't sound like something that's best accomplished bottom-up through Base.:*
methods, but rather something that calls for a top-down freeze
function that returns a single-use fused and frozen operator and leaves its argument unchanged.
However, it would be sad if implicit ODE solvers automatically made use of an overly eager freeze
that would fuse structures for which the disaggregated matvecmul is orders of magnitude faster, as in the example above. Though, actually, it might not be so hard to come up with good heuristics for which nodes to fuse and which to leave unchanged; the computational complexity of a matvecmul is a straightforward function of the shapes of the arrays involved. Still, for my current use cases, I'd be happier if I could simply know that the operator passed through to the iterative solver is identical to the one I instantiate. (I put a lot of thought into that structure in the first place.)
One last thought, if you decide that some binary operations should absolutely fuse, I suppose the consistent choice would be to only accept constant operators and have them throw an error if given operators with nontrivial update functions (with a helpful error message pointing to the equivalent lazy operation). Another alternative that I don't think is a good idea is to let a nominally eager operation act lazily as a fallback---that would make it too hard to reason about expressions involving operators.
An operator is frozen as long as the solver does not call update_coefficients
, so it's easy to freeze without doing any of this. Again, it's just unsafe behavior and there should be no way that an operator silently changes to something without its update_coefficients
being correct, that's just wrong behavior.
I'd be happier if I could simply know that the operator passed through to the iterative solver is identical to the one I instantiate. (I put a lot of thought into that structure in the first place.)
That should always be the case.
An operator is frozen as long as the solver does not call
update_coefficients
, so it's easy to freeze without doing any of this.
Just to clarify: the idea was that Afrozen = freeze(A)
would perform any fusings/transformations that are deemed beneficial for performance but break updating. The point of calling the function would be to perform these transformations, not the "freezing" itself, which as you say is trivial. Nothing is silently changed---A
is not touched, it's freeze
, not freeze!
.
Maybe instead of trying to devise intelligent heuristics for this, each composite operator could carry a flag fuse_on_freeze
defaulting to false
. To set it to true
you'd have to use the constructor, e.g., replace A * B
with ComposedOperator((A, B); fuse_on_freeze=true)
. The behavior of freeze
would be fully determined by these flags.
This isn't important to me and I won't keep beating this drum, I just thought @vpuri3 had a valid concern about passing less-than-optimal operators to iterative solvers and that there may be ways to mitigate it without causing trouble.
If someone split there operator, I think we should assume it was on purpose. Maybe down the line we could do Matrix Chain Multiply optimizations to MatrixOperator when they are constant, but that's a super special case. I wouldn't worry about that for now. Open an issue, call it low priority, and I think we should go safety first and wait for someone to actually have that case before optimizing it (which will likely be never)
and write interface for
Adjoint(::AbstractDiffEqLinearOperator)