FHell commented 4 years ago

We need to add support for more fancy Jacobian work. An earlier version of the Rewrite included an automatic JacVecOperator, as this causes some defaults to misbehave, and required an API change, I got rid of it again, though.

Instead I suggest we add a utility function that creates the Jacobians.

lindnemi commented 3 years ago

Jacobians for ND

The basic idea is to use the chain-rule

dx/dy =  dx/de * de/dy

Assume we have nodes and edges that look like this

function vertex!(dx, x, e, p, t)
    dx[1] = x[2]
    dx[2] = p - x[2]
    for edge in e
        dx[2] += edge[1]
    end
    nothing
end

function edge!(e, v_s, v_d, p, t)
    e[1] = sin(v_s[1] - v_d[1])
    nothing
end

The Jacobian of a single vertex with respect to its internal variables can be derived symbolically, in this case

function Jac_vertex_internal(dx, x, e, p, t) = [[0, 1], [0, -1]]

The Jacobian of a vertex wrt. an incoming edge here denoted as e seems to be easy as well

function Jac_vertex_from_edge(dx, x, e, p, t) = [[0, 1]]

It changes slightly if a different coupling function is used, e.g. when the coupling sum is multiplied with a coupling strength. However if the edges are not homogeneous it becomes more complicated, since they may have different dimensions. If all edges couple through the internal variable with the same index we might do something like the follwing.

function Jac_vertex_from_hetero_edge(dx, x, edge, p, t) = vcat([[0, 1]], [[0, 0] for i in (dim(edge) - 1)])

In this case the vertex needs access to the edge dimension. We might come up with something smarter here, such that the user doesn't have to care about dimensions. The important assumption here is that all edges look the same as seen from the vertex.

For the moment lets stick with StaticEdges. In this case the Jacobian depends only on Vertex variables. To resolve the interdependency of different nodes we need to get the Jacobian of an edge w.r.t the vertices it connects to.

function Jac_edge_from_src(e, v_s, v_d, p, t) =  cos(v_s[1] - v_d[1])
function Jac_edge_from_dst(e, v_s, v_d, p, t) =  - cos(v_s[1] - v_d[1])

Since in general e in a vertex depends on both, v_s and v_d, these terms have to be added, respectively multiplied in the right places. Jac_edge_from_dst should be added to Jac_vertex_internal at the right index. Jac_edge_from_src has to be multiplied according to the chain rule with Jac_vertex_from_edge.

An algorithm to get the i-th row of the jacobian might look somewhat like this:

1. Choose a variable x_i. 
2. Find the vertex it belongs to. 
3. Get the internal Jacobian of that vertex.
4. Find all edges for which this vertex is the dst (out-edges). 
5. Add `Jac_edge_from_dst` to the internal Jacobian.
6. Get the column corresponding to x_i from the internal Jacobian.
7. Compute `Jac_edge_from_src * Jac_vertex_from_edge` for all connected edges and vertices.
8. Get the column corresponding to x_i from that product.
9. Concatenate the two column vectors and fill all gaps with zeros to get a vector of length N.

If the goal is not to get the full Jacobian, but only a JacVecProduct J * v, some details change, but I don't know if it could simplify that algorithm. Step 5 works only for additive coupling.

FHell commented 3 years ago

Thanks for getting started on this! I was thinking about this a bit differently now: Writing out the chain of functions: f(x) which is f_i(x_i, a_i(x)), and a_i(x) = a({e_j(x)}) and e_j(x) = e(x_s(j), x_d(j)). So given a vector v, for which we want J v we can calculate an intermediate \delta e_j = J_e (v_s(j), v_d(j)), then aggregate this into \delta a_j = J_a \delta e_j and then calculate \delta f node wise from that.

Now that I have written that out I just realized: JacVec is just another network dynamics. All we need to do is turn all functions f, a, e into their linearized versions and we get a representation free JacVec! :D

lindnemi commented 3 years ago

Could you elaborate a bit on that, maybe with an example? I'm not really getting it.

By the way, I hate it that github doesn't render latex. That really is an advantage of gitlab.

FHell commented 3 years ago

Agreed on Latex. Let's take the aggregator is sum case:

f_i(x) = v_i(x_i) + \sum_j A_ij e(x_i, x_j)

If you define the linearized versions of v_i and e(x_i, x_j):

v^l_i (zi) = J{v_i} * z_i

e^l (z_i, z_j) = J^1_e z_i + J^2_e z_j _and maybe [+ J^e * ze]

Then

f^l_i (z) = v^l_i (z_i) + \sum_j A_ij e^l(z_i, z_j)

is the JacVec function we are after (Edit:) and this function is itself of the form of a network equation as captured by ND.jl already. The location at which we calculate the JacVec product needs to go into the parameters of v^l and e^l somehow. Maybe that means that (with some minor adjustments) we can handle this almost entirely at the level of the MTK based constructor.

lindnemi commented 3 years ago

Mutating parameters is bad style I guess, but for a proof of concept it may be alright. Some while ago I implemented an extension to the tuple syntax p = (vpars, epars). If e.g. vpars is a 2 x N array, then every vertex receives a 2-dim view into vpars. This could be used as is to store the JacVec product for homogeneous networks. In cases where each node has a different dimension we will need another data structure / parameter syntax.

FHell commented 3 years ago

Upon further reflection we probably should not repurpose nd.jl here... We need to read up on DiffEqs AbstractOperators interface. It will be cleanest to implement that (this also contains an "update coefficients" functionality for non-constant operators which we then can use to update our internal caches...)

lindnemi commented 3 years ago

Discussing the issue this morning we decided to get a prototype working that (ab)uses the parameter argument p. However, this approach has some tight limitations. Still, I believe the JacVecProduct may be realized in a very similar manner to nd_ODE_Static.

I just looked at AbstractOperatorInterface and it seems that such an operator may depend only on u,p,t : https://diffeq.sciml.ai/stable/features/diffeq_operator/

Are there cases where du plays a role in the Jacobian as well? If not we may even use network_dynamics by repurposing du instead of p.

FHell commented 3 years ago

I believe the idea is that we take the location at which the Jacobian currently is as an internal state of the Jacobian. This location is then updated with update_coefficients!(J, u, p, t). Then the overload J * v calculates the Jacobian vector product at location u, with parameters p at time t. This also means we don't have to recalculate the edge values every time we take the derivative.

lindnemi commented 3 years ago

Ah right, cool. That should speed up things when the same Jacobian is used repeatedly. Still, the Jacobian may not depend on du in this case right?

FHell commented 3 years ago

It never should. The AutoDiff based construction of this operator is here, BTW. Worth reading over before we try to implement something ourselves:

https://github.com/SciML/DiffEqOperators.jl/blob/master/src/jacvec_operators.jl

lindnemi commented 3 years ago

Right. We're only dealing with first order ODEs here. @acejolanda @fdrauschk That means that in our proof of concept, we can use p as an input for z and du as the cache for the summation of the different terms.

In the next step we will separate the computation of the coefficients of J_e, J_v from that, store it in some suitable data structure and provide another function that only multiplies z with these coefficients. This function could implement the AbstractDiffEqOperator Interface.

lindnemi commented 3 years ago

# Sketch
struct jac_vertex
    Jac_intern
end
function (jv::jac_vertex)(out, x, edges, z, t)
    out .= jv.Jac_intern * z
    for e in edges
        out .+= e
    end
end

struct jac_edge_struct
    Jac_src
    Jac_dst
end
function (je::jac_edge_struct)(e, v_s, v_d, p, t) # p = [z_s z_d]
    e =  je.Jac_src(v_s-v_d) * p[1] + je.Jac_dst * p[2]

end

Jac_src(v_s, v_d) = cos(v_s, v_d)
Jac_dst(v_s, v_d) = -cos(v_s, v_d)

nd_jac_edge = StaticEdge(jac_edge_struct(Jac_src, Jac_dst))

# diverting the parameter arguments
for i=1:ne(g)
    edgep[i] = [z[src(edge[i])] z[dst(edge[i])]]
end

p = (z, edgep)

# general idea
# 2 node kuramoto dynamics
dx1 = w1 + sin(x1 - x2)
dx2 = w2 + sin(x2 - x1)

# global JacVecProduct
J: (cos  -cos    z1   = cos z1 - cos z2
    -cos cos)    z2   = cos z2 - cos z1

# edge Jacobian
J_e: sin(x_src - xdst) ->     cos
                       ->    -cos

JuliaDynamics / NetworkDynamics.jl

Jacobians... #15

Jacobians for ND