SciML / PDERoadmap

A repository for the discussion of PDE tooling for scientific machine learning (SciML) and physics-informed machine learning
https://tutorials.sciml.ai/
18 stars 6 forks source link

Prototype for the first target (#33) #34

Open MSeeker1340 opened 6 years ago

MSeeker1340 commented 6 years ago

Prototype code for #33. The focus is on setting up the matrix-free operators for the generic L*QB route and come up with a mock user interface to create the operators, all the while taking composition and irregular grid into account.

The basic idea I have is to separate the codebase into two parts: the low level operators and the high-level method DerivativeOperator. The user will never need to explicitly construct any of the low level operators, but instead use the interface defined by DerivativeOperator, which is guaranteed to return a derivative operator or a composed derivative operator (a "factory method" in OOP jargon).

The current mock DerivativeOperator interface looks very much the same as DiffEqOperator's current implementation, with the difference being the presence of xgrid serving to differentiate between uniform and irregular grids. Also I wrote a DerivativeOperator with no BC as the factory method to construct stencil operators (or we can use a different name if this is confusing).

However this version of the interface has the same issue as pointed out by https://github.com/JuliaDiffEq/PDERoadmap/issues/33#issuecomment-415056125, i.e. by packaging the construction of square derivative operators in one place we end up unable to do LB = (L1 + L2)*QB but only the distributed version LB = L1*QB + L2*QB. Of course now that we can also construct stencil operators independently, we can add another user interface for constructing QB (as well as utilities for extending/zero padding the Ls) and then the first version is also possible. Or we can rebuild an entirely new interface.

@jlperla I'd like to hear your opinion on the prototype. Meanwhile I'll try writing an example script to demonstrate the operators' usage.

jlperla commented 6 years ago

Can you show me the interface for how to use them? I am less interested in the backend than the user interface at this point.

ChrisRackauckas commented 6 years ago
bc = (al = 0, bl = 1, ar = 0, bl = 1)
Dx = DerivativeOperator(0:dx:1,1,1,bc)
Dxx = DerivativeOperator(0:dx:1,1,1,bc)
L = Dx + Dxx
full(L) \ u
ChrisRackauckas commented 6 years ago

What's the plan for scalars here? I think that instead of :forward it should know from the scalar.

jlperla commented 6 years ago

? I think that instead of :forward it should know from the scalar.

I agree that upwind should be the default, and not :forward.

But I realized here that it may not be that simple with composition. For example, lets say that I want to use upwind on an operator with negative drift.

L_tilde = a * D_z + sigma^2/2 * D_zz

We know that if a < 0 that we should be using backwards differences for the D_z term.

Then lets say I am solving the stationary HJBE equation

r u_tilde(x) = x + L_tilde u_tilde(x)

with u_tilde'(0) = u_tilde'(1) = 0.

To solve this, we rearrange and define a new operator,

L2 = r I - L_tilde

So in that case with the composition, it seems to work if the a < 0 is checked for the upwind direction. Presumably no problem.

But now lets say that instead of composing things like that did the distributed operation ourselves

L2 = r - a D_z - sigma^2/2 D_zz

or even

L2 = r + (- a) * D_z - sigma^2/2 D_zz

All three of these should be equivalent by linearity of the operator, as we know that for a<0 the correct method is backwards... but if we blindly look at the constant in front, is it possible we could look at the wrong sign?

Furthermore, are there cases where instead of the default of a * D_x being backwards differences if a < 0 it should instead be forward diffrences if a < 0 and backwards if a > 0? I don't know.

Not that these issues are unsurmountable, but I think they should be thought through prior to implementing upwind.

jlperla commented 6 years ago

I guess it also depends on how L2 = r * I - a * DerivativeOperator(x, 1, 1) - sigma^2/2 * DerivativeOperator(x, 2, 2) is parsed by julia with scalar overloading. Since multiplication has precedence over addition could we say that upwind operators will be defined by the sign directly in the left-multiplication? Maybe that is robust enough?

ChrisRackauckas commented 6 years ago

I agree that upwind should be the default, and not :forward.

All three of these should be equivalent by linearity of the operator, as we know that for a<0 the correct method is backwards... but if we blindly look at the constant in front, is it possible we could look at the wrong sign?

Furthermore, are there cases where instead of the default of a * D_x being backwards differences if a < 0 it should instead be forward diffrences if a < 0 and backwards if a > 0? I don't know.

These confusions are all very much related to a bad choice. The choice shouldn't be default upwind or anything like that, it should just know how to handle a coefficient with a default 1.0 coefficient. That's the same thing as defaulting to upwinding. Then the composition method only does addition between the objects in the operator tuple. - is short hand for "multiply the coefficient by -1 and then add this to the tuple". So if everything is written to know how coefficients work, then r - a D_z - sigma^2/2 D_zz and r + (- a) * D_z - sigma^2/2 D_zz falls right out without special handling.

Since multiplication has precedence over addition could we say that upwind operators will be defined by the sign directly in the left-multiplication? Maybe that is robust enough?

Yes. If you just embed a coefficient in there, defaulting to 1.0, then a * DerivativeOperator(x, 1, 1) returns a new operator with a as the coefficient. Then if the loop just implements upwinding with a>0 and downwinding if a<0 this all works. Then with dispatch you can allow a to be an array. Then we can allow a(u,p,t) for DiffEqScalar so that way it plays with the standard updating system, etc. etc.

MSeeker1340 commented 6 years ago
  • is short hand for "multiply the coefficient by -1 and then add this to the tuple". So if everything is written to know how coefficients work, then r - a D_z - sigma^2/2 D_zz and r + (- a) * D_z - sigma^2/2 D_zz falls right out without special handling.

That's basically the gist of it. For unitary - the method definition is here, whereas binary - is defined here. So no matter how the expression is parsed the resulting operator should be the same.

But I agree that we should be extra careful with how upwind operators & coefficients (scalar/vector) are handled. Meanwhile @jlperla do you think the DerivativeOperator interface (minus the direction=:forward part) is fine? If so then we can proceed from this.

jlperla commented 6 years ago

@jlperla do you think the DerivativeOperator interface (minus the direction=:forward part) is fine? If so then we can proceed from this.

I think so. My assumption is that we are going to layer sugar on top of this in a near future step (before I show off the code) , but I think I see how that would work. We can discuss later, but shared sugar in some way with ApproxFun.jl is preferred.

ChrisRackauckas commented 6 years ago

Let's get it working before we start talking about sugar and DSLs. I always like to have a fully functional interface underneath.

jlperla commented 6 years ago

That's the same thing as defaulting to upwinding.

OK. I think I understand the design now, where the coefficients are attached to the operator, but users don't necessarily construct them that way. And maybe the upwind based on the coefficient sign is always correct.

However, I also think that we should also consider operators which force backwards or forwards differences at some point... Even if they are ultimately in separate types. But for my immediate needs (which are consistent based on the scalar sign to what is here) your suggested implementation will always work.

MSeeker1340 commented 6 years ago

After working on a few prototypes I think I've hit a bottleneck for the upwind operators: the need to know the coefficients before determining the upwind direction and the fact that DiffEqOperators' composition is lazy.

Consider this artificial example:

L = c1(t) * (c2(t)L1 + L2)

where L1 is the first-order stencil and L2 is the second order stencil. The upwind direction of L1 is determined by c1(t)*c2(t).

The problem is that the composition is lazy, so however we build the operators, we always have c1(t) and the whole of c2(t)L1 + L2 on one level. This means that, when L*x is called, first the expression tree is transversed depth-first and we arrive at L1. In order to apply L1 we need the upwind direction, i.e. the coefficient. However because of what has been discussed we cannot simply look at c2(t), so we need a back propagation step to get c1(t). This sounds very complicated and I'm not sure how to implement it in our current setting.

One way out of this is to discard total laziness and distribute scalar/vector coefficient multiplitcation across a DiffEqLinearCombination. So the previous example would instead have an expression tree like

L = (c1(t)*c2(t))L1 + c1(t)L2

Getting two time-dependent DiffEqScalar to multiply to create a new one should just be a technicality. On the other hand, once we decided to break laziness there would be a whole bunch of new issues to take care of, so I'm not sure if this is a good idea.

jlperla commented 6 years ago

Yes this example is exactly what I was getting at with my concerns on the implementation... My feeling is that we should only have the upwind sign determined by the sign of the immediate left multiplication. That is easy to describe to people, and they can use brackets to manually change signs if they want to. Precedence of multiplication over addition helps us here

MSeeker1340 commented 6 years ago

My feeling is that we should only have the upwind sign determined by the sign of the immediate left multiplication. That is easy to describe to people, and they can use brackets to manually change signs if they want to.

This seems like an acceptable compromise to me, which should be able to avoid complex tree transversals while keeping the lazy compositions in tact.

MSeeker1340 commented 6 years ago

We probably need a few iterations before arriving at the final decision. But first of all we should agree that the intended user interface should look like

LB1 = DerivativeOperator(xgrid, 1, 1, BC)
LB2 = DerivativeOperator(xgrid, 2, 2, BC)
LB = mu * LB1 + sigma * LB2

That is, the coefficient (either as a scalar, vector or a (u,p,t) function) should come naturally as multiplication. The upwind direction of LB1 should be determined by its most immediate coefficient, which in this case is mu. Furthermore LB1 is able to function as a standalone operator with its coefficient default to one.

Now onto my current prototype: the gist is that a GenericUpwindOperator type is defined to mirror GenericDerivativeOperator (https://github.com/JuliaDiffEq/PDERoadmap/pull/34/files#diff-e942a520042f366e86d772e57811afb8R37), with the difference being that a coefficient is embedded. In the * method this coefficient is used to determine the upwind direction (https://github.com/JuliaDiffEq/PDERoadmap/pull/34/files#diff-e942a520042f366e86d772e57811afb8R53). Furthermore left multiplying a coefficient gets it absorbed into the operator, so the upwind direction can be determined correctly (but multi-level ones such as https://github.com/JuliaDiffEq/PDERoadmap/pull/34#issuecomment-415536613 still doesn't work properly, as per the compromise).

The problem with this approach is that the stencil operator UniformDriftStencil can no longer function alone. This brings the question of how composing different upwind stencils would be possible. It would seem that stencil coefficient fusing would be the only available option in this approach, as lazy compositions such as L_1 + L_2 is no longer possible.

jlperla commented 6 years ago

but multi-level ones such as #34 (comment) still doesn't work properly, as per the compromise)

I would be interested to get thoughts from @chrisrackauckas but I am not sure it is a compromise. Having the drift direction determined by the immediate sign may actually be the "right" way to do it. It certainly makes my concerns about https://github.com/JuliaDiffEq/PDERoadmap/pull/34#issuecomment-415281995 more predictable, as you can setup the theory to make sure upwind occurs correctly, and not be surprised if the direction changes just because you start composing stuff.

MSeeker1340 commented 6 years ago

Having the drift direction determined by the immediate sign may actually be the "right" way to do it.

Oh I haven't thought of this possibility.

BTW I'm currently working on an Option 2 which is pretty similar to https://github.com/JuliaDiffEq/PDERoadmap/pull/34#issuecomment-415567787 but instead of GenericUpwindOperator we create a GenericUpwindStencil. It still has an embedded coefficient but behaves as a stencil operator. The square upwind operator would still be represented as GenericDerivativeOperator. I hope this could address the issue of unable to compose the stencils, but on the other hand left multiplication by scalars may become a bit more difficult.

MSeeker1340 commented 6 years ago

I've put Option 2 on a separate branch to make comparisons easier:

https://github.com/MSeeker1340/PDERoadmap/blob/first-target-option-2/operator_examples/first_target.jl

The difference is that the embedded coefficient is handled on the stencil level with UniformUpwindStencil (https://github.com/MSeeker1340/PDERoadmap/blob/first-target-option-2/operator_examples/first_target.jl#L37). The advantage for this is that the stencils can now be used in a standalone fashion. Again left multiplication by scalars is overloaded. In addition, left multiplication by scalar for square upwind operators is also overloaded (https://github.com/MSeeker1340/PDERoadmap/blob/first-target-option-2/operator_examples/first_target.jl#L81) so the usage case in https://github.com/JuliaDiffEq/PDERoadmap/pull/34#issuecomment-415567787 is still ok.