Add operator interface notebook

MSeeker1340 commented 6 years ago

Prototype notebook for the operator overview document (nbviewer). Most of the abstract type hierarchy is taken from my previous notebook (simplified for the sake of brevity). Not so sure about some aspects of the notebook so I could use your suggestions:

The affine operator interface. Do you think having operations on affine operators always return a DiffEqAffineOperator is a good idea?
The Q operators. In particular, having dedicated "basic" operators for simple BC (AbsorbingBoundaryMap and ReflectingBoundaryMap) and build complex BC from the basic ones.

2.1. Also, what should the Q operators be called?
The drift operators. The way it is done in the notebook (separate right/left going upwind operator) is different from what is currently implemented in DiffEqOperators (a single operator with an embedded directions vector). Performance-wise the latter is probably better (half the amount of work) but the code might get complicated if we start doing time-dependent coefficients.
The coefficients: currently they are implemented just as Diagonal matrices. Another approach would be to embed the coefficients inside the operators.

By the way, the last section is just meant to check the interface itself. Can anyone suggest some concrete examples that I can check to make sure the implementation is correct numerically?

jlperla commented 6 years ago

Great! @ChrisRackauckas , can I just merge these pull requests in directly, or is that verboten?

The UBC guys have also made progress on the notation and examples recently, so it is now worth looking at.

ChrisRackauckas commented 6 years ago

The Q operators. In particular, having dedicated "basic" operators for simple BC (AbsorbingBoundaryMap and ReflectingBoundaryMap) and build complex BC from the basic ones.

For non-lazy this is fine. For lazy this wouldn't be the best route (as we've discussed). But having them there for ease of use is at least not harmful.

Do you think having operations on affine operators always return a DiffEqAffineOperator is a good idea?

Yes. In general they have almost no chance of canceling out the additive term, so they should go affine->affine.

The drift operators. The way it is done in the notebook (separate right/left going upwind operator) is different from what is currently implemented in DiffEqOperators (a single operator with an embedded directions vector). Performance-wise the latter is probably better (half the amount of work) but the code might get complicated if we start doing time-dependent coefficients.

The coefficients: currently they are implemented just as Diagonal matrices. Another approach would be to embed the coefficients inside the operators.

I think these two go together. The drift operators need to know about the coefficients in order to turn on/off or change directions. Either way, I think it ends up being the same issue that the coefficients need to somehow be embedding in the operator to check a boolean (in which case it ends up being easier to just specify upwind/downwind). This is why we did this before. While having the coefficients in the DiffEqLinearCombination is a neat idea, I'm not sure how this would be solved like that.

Great! @ChrisRackauckas , can I just merge these pull requests in directly, or is that verboten?

Feel free to, but we should make sure the questions open up in an issue then so that way we can discuss.

jlperla commented 6 years ago

From what I can see it looks great.

For the upwind, to make sure I understand: the idea is that - given a drift vector $\mu(x)$ - we would manually update the coefficients for the $\mu^+(x)$ and $\mu^-(x)$ for the particular $x$, and then compose the two parts everytime given a particular $\mu(x)$? If so, sounds perfectly reasonable to me. We can always define a convenience operator which contains the two parts when things are lazy.

MSeeker1340 commented 6 years ago

While having the coefficients in the DiffEqLinearCombination is a neat idea, I'm not sure how this would be solved like that.

The way I look at it:

Have coefficients-embedded operators would definitely help, both in terms of interface (getting a DiffEqOperatorComposition isn't exactly straightforward to the user) and performance.
On the other hand, "bare" operators (i.e. the current implementation of derivative and upwind operators) should still be included in the codebase, just in case we need them as the building blocks of more complex objects.

So basically, if we're dealing with a very specific scenario (i.e. getting the Fokker-Planck operator for an SDE), we go with route 1 and return the user a "tailored" operator that is both efficient and easy to use. e.g.

L = FPOperator(xs, mu, sigma)

And, if the user wants some bizzare form of operator, she can always do so using the basic operators and then compose them.

ChrisRackauckas commented 6 years ago

The coefficients on the operators could just be parameterized to allow nothing?

jlperla commented 6 years ago

So basically, if we're dealing with a very specific scenario (i.e. getting the Fokker-Planck operator for an SDE), we go with route 1 and return the user a "tailored" operator that is both efficient and easy to use

OK in general. Though in this specific example I really don't like calling this thing the Fokker-Planck operator. As far as I understand it, the Fokker-Planck equation is the same thing as the Kolmogorov Forward Equation. However, the majority of things I want to use this library for solving are the Kolmogorov Backward Equation (which is more directly related to the Feynman-Kac formula).

Since the differential operator can be used to solve both forward and backward equations, it is confusing to associate it with only one of them. My preference is to try to name these things based on the stochastic processes they are the infinitesimal generator of (i.e. https://en.wikipedia.org/wiki/Infinitesimal_generator_(stochastic_processes)#Generators_of_some_common_processes) , but I am open minded if there is an alternate approach.

jlperla commented 6 years ago

I reviewed the prototype in more detail. Things are looking great, but I think that my biggest comment is that we need to take domains and grids more seriously to make the interface work well when we move to irregular grids (which we need sooner than later for some applications).

This isn't to say that we need to implement it completely quite yet, but design things to be more abstract. In particular, I think that instead of

# The grid
N = 10
dx = 1.0
xs = collect(1:N) * dx # interior nodes

# Interior operators
A1p = DriftOperator(dx, N, true)
A1m = DriftOperator(dx, N, false)
A2 = DiffusionOperator(dx, N)

The interface should look something more like the following (trying to be consistent with new notation)

x̃ = Interval(1.0, 10.0) #Continuous domain on the interior.  Later can take product with discrete
N = 10
x = RegularGrid(x̃, N) #Note: I think it is better to give points than dx, but I am open.
A⁺ = DriftOperator(x, :forwards) #Note passing in grid, so interface could allow irregular later.
A⁻ = DriftOperator(x, :backwards)
A₂ = DiffusionOperator(x)

This brings up the https://github.com/JuliaDiffEq/DifferentialEquations.jl/issues/260 discussion. Here I have tried to make things consistent with https://github.com/JuliaDiffEq/ModelingToolkit.jl/blob/master/test/domains.jl

jlperla commented 6 years ago

The other thing I notice in here is the use of $L$ along with $L1, L2$, etc. For exmaple,

# Construct L
L1 = DiffEqArrayOperator(Diagonal(mup)) * A1p + DiffEqArrayOperator(Diagonal(mum)) * A1m
L2 = DiffEqArrayOperator(Diagonal(sigma.^2 / 2)) * A2
L = (L1 + L2) * Q
# Solve the HJBE (rI - L)u = x
r = 0.05
if isa(L, DiffEqAffineOperator)
    LHS = r*speye(N) - as_array(L.A)
    RHS = xs + L.b
else
    LHS = r*speye(N) - as_array(L)
    RHS = xs
end
u = LHS \ RHS

I want to make sure I understand how this maps to the notation (being iterated on in https://github.com/JuliaDiffEq/PDERoadmap/blob/master/linear_operators_overview.tex ) A few things in particular:

Is there a reason to start using the L1 and L2? That is, why the use of the notation L1 = ... etc. instead of just composing a new A = stuff * A1p + stuff* A1m + stuff * A2 and then L = A * Q
is the $L.A$ giving what is written in the document as $A \cdot Q_A$ and the $L.b$ giving $ A \cdot Q_b$? Or something like that? The document could be wrong, but @stevenzhangdx and @FernandoKuwer can iterate the algebra in the examples to match the code if it is. Lets get them fairly consistent.
For that matter, does introducing the L = (L1+L2)*Q make sense in the current notation? I suppose this is as close to a discretization of $\mathcal{L}as we get, so maybe it isn't the worst idea to call itL`.

ChrisRackauckas commented 6 years ago

is the $L.A$ giving what is written in the document as $A \cdot Q_A$ and the $L.b$ giving $ A \cdot Q_b$? Or something like that? The document could be wrong, but @stevenzhangdx and @FernandoKuwer can iterate the algebra in the examples to match the code if it is. Lets get them fairly consistent.

L.A is an implementation detail. If we get this right, field access shouldn't be used.

That is, why the use of the notation L1 = ... etc. instead of just composing a new A = stuff A1p + stuff A1m + stuff A2 and then L = A Q

Depends on how we implement the upwinding operators. If the lazy version does stuff * A1p + stuff* A1m, then it makes sense to call that L1 (and that has coefficients in it? That's part of the implementation question right now).

jlperla commented 6 years ago

Depends on how we implement the upwinding operators.

I see, OK, forget about whether the upwind is done lazily in a shared interface or with the user manually splitting them. Part of this is to see if it makes sense call the sub-parts L1 rather thanA1` prior to us attaching boundary values to them? This comes back to when to use $A$ vs. $\mathcal{L}$ in https://github.com/JuliaDiffEq/PDERoadmap/blob/master/linear_operators_overview.tex

Also, for what it is worth, the main use of the upwinding is for solving HJBE in things like (1) and (2) of http://www.princeton.edu/~moll/HACTproject/HACT_Numerical_Appendix.pdf

These HJBE are nonlinear ODEs, but linear conditional on the solution to the hamiltonian part of them which controls the drift. So the typical algorithm would look something like:

start with a guess for the $v_i(z)$ functions we are solving, then use the FOC for the hamiltonian to get the implied drift consistent with the optimal control
Then use the drift vector (which has both positive and negative drifts) to solve the (1) and (2) as a linear system of ODEs with the library we are writing composing the operator
Then given the new solution for the $v_i(z)$, find the drifts for the HJBE
Repeat, iterating in some way until the value function and the hamiltonian are solved.

What this means is that there is a repeated solution of the linear differential equations where the only thing changing between iterations is the drift vector, which is defined on the grid points (and where the drift direction may switch between iterations for a given grid point). If all of this can be modified in place without allocating matrices/etc. each time, it would be nice.

ChrisRackauckas commented 6 years ago

If all of this can be modified in place without allocating matrices/etc. each time, it would be nice.

Yeah, that's the main goal of making it lazy here. What would end up happening is that the coefficients determine direction, and so no matrices need to be updated, just coefficient vectors. That would be a nice saving. Exactly how to get there is the implementation work, so I think we need to go try building it and see what happens 😄 .

jlperla commented 6 years ago

Given that it is lazy, and that I am iterating on a vector $\mu$ of drifts, I guess we have 2 options.

First, have an 'upwind' specific operator which takes care of things directly. Then I guess changing it would look something like,

x̃ = Interval(1.0, 10.0)
x = RegularGrid(x̃, 10)
μ₀ = ....  #starting guess...
A_UW = UpwindDriftOperator(μ₀, x) # Takes care of the mu.

#... do things, then get ready to update
μ_new = ....  #New guess
update_coefficients!(A_UW, μ_new)

Alternatively, we could do this with the two things ourselves,

x̃ = Interval(1.0, 10.0)
x = RegularGrid(x̃, 10)
μ₀ = ....  #starting guess...
μ₀⁺ = clamp.(μ₀, 0.0, Inf) #Or have a special function...
μ₀⁻ = clamp.(μ₀, -Inf, 0.0)
A⁺ = DriftOperator(x, μ₀⁺ , :forwards) 
A⁻ = DriftOperator(x, μ₀⁻, :backwards)
A = A⁺ + A⁻ + ... #compose operators

#... do things, then get ready to update
μ_new = ....  #New guess
μ⁺ = clamp.(μ_new, 0.0, Inf) #Or have a special function...
μ⁻ = clamp.(μ_new, -Inf, 0.0)
update_coefficients!(A⁺, μ⁺)
update_coefficients!(A⁻, μ⁻)

The second seems perfectly reasonable for now. But it does seem like the operators will need to hold a vector in them?

ChrisRackauckas commented 6 years ago

A⁺ = DriftOperator(x, μ₀⁺ , :forwards) That kind of thing isn't what we want though. We want the interface on the operators to be independent of the operator. If it requires that you know what operator you have in order to write the function, then it's impossible to write library code on it. That's why the update_coefficients!(A,u,p,t) is like it is: if your operator is A, then no matter what coefficient function it has you can know this updates it. Drift operators, Laplacian, 2D, etc. This means you can write your policy iteration or Newton iteration in a way that's not operator-specific, and then changing PDEs is just changing what operators you plug in there.

jlperla commented 6 years ago

OK, let me try this a different way then,

x̃ = Interval(1.0, 10.0)
x = RegularGrid(x̃, 10)
μ₀ = ....  #starting guess...
μ₀⁺ = clamp.(μ₀, 0.0, Inf) #Or have a special function...
μ₀⁻ = clamp.(μ₀, -Inf, 0.0)
A⁺ = DriftOperator(x, :forwards) 
A⁻ = DriftOperator(x, :backwards)
μ⁺ = DiffEqArrayOperator(Diagonal(μ₀⁺))
μ⁻ = DiffEqArrayOperator(Diagonal(μ₀⁻))

A =μ⁺ * A⁺ + μ⁻ * A⁻ + ... #compose operators

#... do things, then get ready to update
μ_new = ....  #New guess
μ⁺_new = clamp.(μ_new, 0.0, Inf) #Or have a special function...
μ⁻_new = clamp.(μ_new, -Inf, 0.0)
update_coefficients!(μ⁺, μ⁺_new)
update_coefficients!(μ⁻, μ⁻_new)

Is that more similar to what you are thinking?

ChrisRackauckas commented 6 years ago

μ_new = ....  #New guess
μ⁺_new = clamp.(μ_new, 0.0, Inf) #Or have a special function...
μ⁻_new = clamp.(μ_new, -Inf, 0.0)
update_coefficients!(μ⁺, μ⁺_new)
update_coefficients!(μ⁻, μ⁻_new)

Why not just have that be the update_coefficients!(A,u,p,t)?

jlperla commented 6 years ago

I don't understand the mapping between that interface and the parameters I have in this exact example. If you can do it, that would help. I just want to change the vector of drifts I have....

jlperla commented 6 years ago

(and keep in mind that there may be all sorts of parameters associated with different parts of the composed operators)

ChrisRackauckas commented 6 years ago

I don't understand the mapping between that interface and the parameters I have in this exact example. If you can do it, that would help. I just want to change the vector of drifts I have....

Just take what you wrote and put it in a function. That function is what you make update_coefficients! do.

jlperla commented 6 years ago

So then it recreates the composition each time I change the parameters, allocating all of the data/etc.? I thought your goal was to make it so that all of the constants associated with the particular constants could be modified? Also, I thought that update_coefficients!(A,u,p,t) was only used when we wanted to actually solve the ODE, not compose things? In general, I guess I still don't understand what

If this is all obvious to you, I am really missing some key parts of what you are suggesting. Also, since there isn't any use of update_coefficients!(A,u,p,t) in the prototype, I am not even sure where it would go in the code.

And, keep in mind, that if this wasn't stationary (i.e., if we were solving the time-varying backwards equation) then the vector $\mu$ would be different for every time period, with possible flips in the upwind direction at every one of those. The iterative process to solve the nonlinear HJBE would involve changing the whole vector of drifts.

jlperla commented 6 years ago

One last thing on this: Keep in mind that if I have a composed operator with a jump-diffusion, I could potentially have a vector of drifts, a vector of variances, a vector of jump sizes, and a vector of jump intensities.... or some of those could be constants... or missing... So I just don't understand how update_coefficients! on a composed operator works. I can understand how it works on an individual operator where there may only be a single constant or vector of constants... though maybe that isn't what update_coefficients! is used to modify.

MSeeker1340 commented 6 years ago

Haven't gone through all of the discussion yet. Just to answer @jlperla's initial question regarding the notebook:

why the use of the notation L1 = ... etc. instead of just composing a new A = stuff A1p + stuff A1m + stuff A2 and then L = A Q

I'm just separating L1 and L2 so that the code is clearer to read. There's no particular reason not to construct L in one pass as you suggested.

And I realized the notation is a bit misleading... they should really be called A1 and A2, both being operators that map the whole domain to the interior (as opposed to L, which maps the interior to the interior). But then there's already an A2 previously (the diffusion operator without the coefficients), so I probably should change them as well.

is the L.A giving what is written in the document as A⋅QA and the L.b giving A⋅Qb

Yes.

As Chris pointed out we shouldn't use field access in the final version. Actually I originally included an overloaded \ operator for DiffEqAffineOperator, which is very straightforward:

\(L::DiffEqAffineOperator, y) = as_array(L.A) \ (y - b)

So solving the HJBE should be as simple as (r*I - L) \ xs, we only need to implement the - operator and do something about the identity operator.

ChrisRackauckas commented 6 years ago

One last thing on this: Keep in mind that if I have a composed operator with a jump-diffusion, I could potentially have a vector of drifts, a vector of variances, a vector of jump sizes, and a vector of jump intensities.... or some of those could be constants... or missing... So I just don't understand how update_coefficients! on a composed operator works. I can understand how it works on an individual operator where there may only be a single constant or vector of constants... though maybe that isn't what update_coefficients! is used to modify.

On the composed operator it just recurses. It's equivalent to what you're doing. I think that it's just hard to explain why it's necessary. It's the difference between using L.A vs get_array(L)=L.A and then using the function. It's the same thing? Not for generic code. But that's an implementation detail.

jlperla commented 6 years ago

And I realized the notation is a bit misleading... they should really be called A1 and A2, both being operators that map the whole domain to the interior (as opposed to L, which maps the interior to the interior). But then there's already an A2 previously (the diffusion operator without the coefficients), so I probably should change them as well.

Thanks. I wonder if the current use of $A$ and $\mathcal{L}$ in the document (see the new version after my changes) needs an iteration. I am not sure if it is using the notation in this way, so I think it is worth making sure we agree on the notation in the document (and modify as required) prior to naming things in the code.

jlperla commented 6 years ago

On the composed operator it just recurses. It's equivalent to what you're doing.

I am just entirely clueless on what exactly you are suggesting the code would look like. We have a very specific example, can you help write the code you imagine for using the interface? In fact, lets simplify it and skip the upwind part. Assume there is an operator with a drift all in a single direction, so we can pick the upwind ourselves. Using a variation of the existing prototype code directly,

# The grid
N = 10
dx = 1.0
xs = collect(1:N) * dx # interior nodes

# Interior operators
A1 = DriftOperator(dx, N, true)
A2 = DiffusionOperator(dx, N)

# Boundary operators
Q = AbsorbingBoundaryMap(N)

# Coefficients
mu = rand(N) #This always returns positive numbers
sigma = rand(N) + 1.0

# Construct L
L1 = DiffEqArrayOperator(Diagonal(mu)) * A1
L2 = DiffEqArrayOperator(Diagonal(sigma.^2 / 2)) * A2
L = (L1 + L2) * Q

#Then to use it,
(r*I - L) \ xs #After the - is implemented, from the last comment.

Now, lets say that I want to change the drift vector mu and resolve the ODE.

#1. Generate a new vector
mu2= rand(N)

#2. Change the drifs on either L1 or the DiffEqArrayOperator(Diagonal(mu)) in L1?
???? #Use mu2.  How does a generic update_coefficients!() help????

#3. Resolving looks the same to me
(r*I - L) \ xs #After the - is implemented, from the last comment.

What I was suggesting is that that you can change whatever the coefficients were on the underlying sub-operator, if they have any. So let met decompose this a little bit more,

C1 = DiffEqArrayOperator(Diagonal(mu))
L1 = C1 * A1
L2 = DiffEqArrayOperator(Diagonal(sigma.^2 / 2)) * A2
L = (L1 + L2) * Q

Then, you could call a funciton to just modify the C1. I called it update_coefficients! above, but maybe this is a totally different idea. Call it modify_parameter! so that you could do the change with,

mu2= rand(N)
modify_parameter!(C1, mu2)
(r*I - L) \ xs #Since L was lazy, this uses the modified C1 values.

MSeeker1340 commented 6 years ago

I see the major point of discussion here is update_coefficients!, in particular how it works on a composed operator.

Basically as Chris explains it works recursively. I think my original notebook on the operator interface helps to illustrate the point (section link). When we call update_coefficients! on the composed operator, it is applied to each of the component operators (the for_each part). And since the function always updates the operator in-place, the operator structure won't change. In other words no new operators are created in the process.

jlperla commented 6 years ago

I see the major point of discussion here is update_coefficients!, in particular how it works on a composed operator.

I don't care how it works, I care what code I write as a user :-) Recursively and lazily generating things is fine for me in the background.

I just don't understand the interface: I thought update_coefficients! was something that tended to happen in the background for solving ODEs/PDEs or getting out the matrices from the operator, until Chris suggested it was the solution to changing the drifts.

Is there simply a misunderstanding about what I meant by "changing the coefficients" for the iterative process solving a HJBE?

ChrisRackauckas commented 6 years ago

Is there simply a misunderstanding about what I meant by "changing the coefficients" for the iterative process solving a HJBE?

Each iteration you have a new state vector u. For state-dependent drift and diffusion, mu and sigma can depend on u. In general, they can depend on independent variables, dependent variables, and possibly other parameters. That's what (u,p,t) is. update_coefficients!(A,u,p,t) just re-evaluates the current mu and sigma coefficients based on the current (u,p,t), and sets the array of coefficients for mu/sigma that's stored inside of A to the right values, so that way further operations utilize the correct coefficients. If you have no time or no parameters, just update_coefficients!(A,u,nothing,nothing) is fine. In fact, we can probably make the interface just have default args there, so update_coefficients!(A,u,p=nothing,t=nothing) allows for update_coefficients!(A,u), though it's a little jarring to special case against something that's so widespread.

MSeeker1340 commented 6 years ago

C1 = DiffEqArrayOperator(Diagonal(mu)) L1 = C1 A1 L2 = DiffEqArrayOperator(Diagonal(sigma.^2 / 2)) A2 L = (L1 + L2) * Q

Then, you could call a funciton to just modify the C1. I called it update_coefficients! above, but maybe this is a totally different idea. Call it modify_parameter! so that you could do the change with,

mu2= rand(N) modify_parameter!(C1, mu2) (r*I - L) \ xs #Since L was lazy, this uses the modified C1 values.

I think that, instead of this, the intended usage of update_coefficients! looks something like:

update_func(coeffs,u,p,t) = copy!(coeffs,p)
C1 = DiffEqArrayOperator(Diagonal(mu), update_func)
L1 = C1 * A1
L2 = DiffEqArrayOperator(Diagonal(sigma.^2 / 2)) * A2
L = (L1 + L2) * Q

mu2= rand(N)
update_coefficients!(L, nothing, mu2, nothing)
(r*I - L) \ xs

The way it works is because the full implementation of DiffEqArrayOperator looks like

struct DiffEqArrayOperator
    A::AbstractMatrix
    update_func
end
update_coefficients!(C::DiffEqArrayOperator,u,p,t) = C.update_func(C.A,u,p,t)

I realized this is not very straightforward (e.g. having to manually specify u=nothing and t=nothing), but @ChrisRackauckas I think this is the intended approach right? We can simplify the writing of update_func a bit by introducing some convenient functor types though.

ChrisRackauckas commented 6 years ago

Yes, exactly. And then for example in the quasi-linear case (coefficients depend on u) then you'd iterate:

# Start with some u=u0 guess
update_coefficients!(L, u, mu2, nothing)
u = (r*I - L) \ xs

and that would be functional iteration (then you can make this Newton etc.). Then if the operator is time-dependent, there's a time spot. We can find ways to simplify it in special cases, but it does cover all of the bases that we need.

jlperla commented 6 years ago

Having the nothing is OK... my problem is more basic. How does it know which part of the lazily composed operator it refers to...? For example, what if I also have the sigma as a parameter, and am composing multiple drifts, that is.

A1 = DriftOperator(dx, N, true)
A2 = DiffusionOperator(dx, N)

mu = rand(N)
anothermu = rand(N)
sigma = rand(N) + 1.0

C1 = DiffEqArrayOperator(Diagonal(mu))
C1 = DiffEqArrayOperator(Diagonal(anothermu))
C3 = DiffEqArrayOperator(Diagonal(sigma.^2 / 2)) 

L1 = C1 * A1
L2 = C2 * A2
L3 = C3 * A1

L = (L1 + L2 + L3) * Q
#solve it...
(r*I - L) \ xs

#Want to modify the C1.  How?
mu2= rand(N)

update_coefficients!(L , nothing, mu2, nothing) #How does it know which constants to modify if calling on L
(r*I - L) \ xs

ChrisRackauckas commented 6 years ago

You modify the whole thing.

jlperla commented 6 years ago

Forgetting the whole "iterative" algorithm, there are many cases where we just need to solve things with a given stochastic process which has constants. I think it is valuable to look at that example, and perhaps have a particular type for the composition.

x̃ = Interval(1.0, 10.0)
x = RegularGrid(x̃, 10)
A₁ = DriftOperator(x, :backwards)
A₂ = DiffusionOperator(x)
μ = ConstantOperator(0.1)
σ = ConstantOperator(1.2)

A =μ * A₁ + σ * A₂ #i.e., generator for d X_t = 0.1 dt + 1.2 dW_t

#Define the Q for the boundary
L = A* Q

#Solve a  payoff vector b
(r*I - L) \ b

ChrisRackauckas commented 6 years ago

We don't need to wrap

μ = ConstantOperator(0.1)
σ = ConstantOperator(1.2)

Dispatch already handles this.

jlperla commented 6 years ago

Even better!

jlperla commented 6 years ago

@ChrisRackauckas @MSeeker1340 Now that I understand we were talking about very usage scenarios, I want to come back to the interface for "functions" in operator composition. While the ability to have a DiffEqArrayOperator is important, I would say that the majority of users would really just want do one of two things: (1) use a constant coefficient (which is dispatched already!) and (2) define a function of the underlying state, independent of the grid itself!, which I think is closer to what you guys were talking about before.

What I would say, though, is that I am not sure the DiffEqArrayOperator interface is the most natural one for functions (even if it may be used internally). So what is the most natural user interface for this?

Let me say that for economists usage, 90% of usage of this library (and close to 100% at first) would be for the stationary and fully affine (i.e. not quasi-linear) cases, where functions are (in effect) only a function of the state, and those functions probably would contain parameters as closures. So can I suggest that there is one special affine and time-invariant case, and one with the full specification including the u, t, p, etc.?

Let me first state my dream world for the stationary version, and then we can see if it is possible. I would like to write the following code:

x̃ = Interval(1.0, 10.0)
x = RegularGrid(x̃, 10)
A₁ = DriftOperator(x, :backwards)
A₂ = DiffusionOperator(x)

#Some constants
θ = 1.0
γ = 0.5
σ = 1.1
μ(x) = θ * (γ - x) #See https://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process

A =μ * A₁ + σ/2 * A₂ #i.e., generator for d x_t =  θ * (γ - x_t)  dt + σ dW_t

#Define the Q for the boundary
L = A* Q

#Solve a  payoff vector b
(r*I - L) \ b

For this to work, it would need to dispatch on the function... which I can see wrapping in a DiffEqArrayOperator or whatever. I this this is effectively just filling in an update_func you were talking about, but the simplest case is worth having a special case. For the full case with u,x,t,p etc. it is fine to have a heavier interface.

ChrisRackauckas commented 6 years ago

Dispatch on functions should be used sparingly because it's fragile if done incorrectly. I do not think this is a good use of it. μ(u,p,t,x) = θ * (γ - x) would do just as well. We don't gain anything except syntactic sugar, and the sugar is only 5 characters. A macro would do better if you really really really see shortening the arglist as important.

Let me say that for economists usage, 90% of usage of this library (and close to 100% at first) would be for the stationary and fully affine (i.e. not quasi-linear) cases, where functions are (in effect) only a function of the state, and those functions probably would contain parameters as closures.

I would estimate that about <20% of the library usage will fall into this category. The largest class of users for this functionality is based around quantum dynamics with time and or space dependent coefficients (Schrodinger is a KFE, and packages like QuantumOptics.jl already exist), so most people will be using u and t. Everyone can and does make use of p except in trivial examples. If it really made a difference to that 20% I would do it, but it doesn't. There's literally no functional difference between the two function choices, just a syntax. So you'd be getting all of the issues of function dispatch and multiple syntax support (+ documentation issues) with no payoff other than dropping 5 characters. That's not something that's good to support.

jlperla commented 6 years ago

Dispatch on functions should be used sparingly because it's fragile if done incorrectly. I do not think this is a good use of it. μ(u,p,t,x) = θ * (γ - x) would do just as well.

For sure, my main concern was less the ability to dispatch based on a different number of arguments, and more the ability to dispatch based on a function with the correct number of arguments.

If you are saying that the following interface would work, then have a few ignoring parameters is completely fine. That is?

μ(u,p,t,x) = θ * (γ - x) 
A =μ * A₁ + σ/2 * A₂

I was worried that every use would involve the complicated update_func callback/etc. or a manual rewriting of a update_coefficients!

ChrisRackauckas commented 6 years ago

If you are saying that the following interface would work, then have a few ignoring parameters is completely fine. That is?

Yup, that should work.

I was worried that every use would involve the complicated update_func callback/etc. or a manual rewriting of a update_coefficients!

No, that's behind the scenes stuff. update_coefficients! is what you'd call in iterative algorithms that have to be operator-independent, but again most users probably won't be writing those algorithms. The update_func is how we do this all behind the scenes. Essentially, what's going on behind the scenes with

A =μ * A₁ + σ/2 * A₂

is that your μ(u,p,t,x) becomes the update_func for the coefficients on the A₁ part. Then what update_coefficients!(A,u,p,t) does is it cycles through the linear operators. It goes to μ * A₁, sees that it's a function, and then calls μ(u,p,t,x) to get updated coefficients. Then it cycles to σ/2 * A₂, sees that it's a vector/number and ignores it. That means that A is now updated to the current state, so * will be correct, and it does this without knowing how many operators are in there or what exactly they are.

So there's two pieces here. On one hand there's notation for lazy operator construction that a user is free to utilize since it's not actually creating matrices. But on the other hand, there's lazy function-based updates that lets package developers "do the same thing" without knowledge of the problem domain.

Let me write an example showing this.

jlperla commented 6 years ago

I think this also tell me that the

DiffEqArrayOperator(Diagonal(mup))

is probably a lower-priority, and lower level interface than what we need for now. There are a few places where I can imagine using it directly, but just passing in a μ(u,p,t,x) which has a call to a linear interpolation over the grid points is good enough in the short term. I think the syntactic sugar over composing the - operator with constants, etc. is more useful.

Is there a value in moving this prototype into .jl code and a mini-module so that it is easier to play around with it and create PRs?

ChrisRackauckas commented 6 years ago

Let's say you want to solve OU with multiple parameter sets.

User code would look like this. I would probably just recommend "using the arrays", and since you're using the operators yourself you can use whatever function syntax you want.

μ(p,x) = @. p[1] * (p[2] - x)
ps = [rand(2) for i in 1:100]
x = linspace(0,1,10)
b = rand(...)

for i in 1:100
  p = ps[i]
  A = μ(p,x) * A₁ + σ/2 * A₂
  y = A\b # this is the solution, do something with it
end

Done (and notice how your example μ(u,p,t,x) = θ * (γ - x) was really already making use of p but you didn't know it. You want to localize to p anyways for type-stability, and then this syntax also makes it easy to change these values around etc. etc.).

But now let's build a package code that solves the linear operator for a user's chosen set of ps. We'd write:

function solve_multiple_ps(A,b,ps)
  sols = Array{Float64}[]
  for i in 1:length(ps)
    p = ps[i]
    update_coefficients!(A,nothing,p,nothing)
    push!(sols,A\b) # this is the solution, do something with it
  end
  sols
end

And importantly, notice that it's not dependent on the choice of A like the previous code was. Users can call this function like:

μ(u,p,t,x) = @. p[1] * (p[2] - x)
A = μ * A₁ + σ/2 * A₂
ps = [rand(2) for i in 1:100]
x = linspace(0,1,10) # somehow the operator should know x internally already by construction
b = rand(...)
solve_multiple_ps(A,ps,b)

This does the same thing as the previous user code, but works for any time and space independent linear operator A. I remember you mentioned that you wanted things like functional iteration in NLsolve.jl to work directly with this stuff. This is exactly how that will happen.

jlperla commented 6 years ago

Gotcha. Very helpful for cases where you might want so change the parameters, and I now see exactly what you were getting at before with this interface.

BTW, I assume that the p is generic enough to be a named tuple? If the named-tuple is type-stable, then that should be just as efficient in this example?

ChrisRackauckas commented 6 years ago

BTW, I assume that the p is generic enough to be a named tuple? If the named-tuple is type-stable, then that should be just as efficient in this example?

Yup, it can be any type. It just gets passed right through.

SciML / PDERoadmap

Add operator interface notebook #1