dfdx / XGrad.jl

eXpression gradients in Julia
Other
3 stars 4 forks source link

Jacobians #15

Open cortner opened 6 years ago

cortner commented 6 years ago

Would you consider extending this package to compute jacobians?

I suppose on some level this can be "hacked" by differentiating the individual components, but if there are a lot of intermediate expressions that go into several components of the final f(x) vector, then this could really improve performance.

(My current use-case is differentiation of vector-valued multi-variate polynomials in up to 10, later maybe more, variables.)

dfdx commented 6 years ago

Yes, I thought about it, and even have a function to merge several expression trees, but I haven't experimented with it yet. Give me a couple of days to play around with it.

dfdx commented 6 years ago

Here's a manual code for calculating "something like jacobian":

# example vector-valued expression
base_ex = quote
    x1 = exp.(W*x)
    x2 = exp.(W*x)
    x3 = exp.(W*x2)
end

# adding `x3[1]` and `x3[2]` to run normal differentiation pipeline
ex1 = copy(base_ex) |> sanitize; push!(ex1.args, :(z = x3[1]))
ex2 = copy(base_ex) |> sanitize; push!(ex2.args, :(z = x3[2]))

# finding derivative expressions for these 2 components
W = rand(3,3); x = rand(3)
inputs = [:W => W, :x => x]
ctx = Dict(:codegen => VectorCodeGen())
dex1 = xdiff(ex1; ctx=ctx, inputs...)
dex2 = xdiff(ex2; ctx=ctx, inputs...)

# joining diff expressions for the 2 components
dex = copy(dex1)
for subex in dex2.args
    push!(dex.args, subex)
end
# auxiliary variable to return all the results
push!(dex.args, :(result = ($(dex1.args[end].args[1]), $(dex2.args[end].args[2]))))

# create graph from the joint diff expression and eliminate common subexpressions
g = ExGraph(dex; inputs...)
evaluate!(g)
g = eliminate_common(g)
to_expr(g)

which generates:

quote  # /home/azhabinski/.julia/v0.6/Espresso/src/exgraph.jl, line 99:
    tmp1342 = 1
    tmp1357 = transpose(W)
    dz!dz = 1.0
    tmp1338 = W * x
    tmp1352 = exp.(tmp1338)
    tmp1348 = transpose(tmp1352)
    tmp1340 = W * tmp1352
    tmp1346 = exp.(tmp1340)
    dz!dx3 = ungetindex(tmp1346, dz!dz, tmp1342)
    dz!dtmp1340 = tmp1346 .* dz!dx3
    dz!dx2 = tmp1357 * dz!dtmp1340
    dz!dtmp1338 = tmp1352 .* dz!dx2
    dz!dx = tmp1357 * dz!dtmp1338
    dz!dW__1 = dz!dtmp1340 * tmp1348
    z = getindex(tmp1346, tmp1342)
    tmp1354 = transpose(x)
    dz!dW__2 = dz!dtmp1338 * tmp1354
    dz!dW = dz!dW__1 .+ dz!dW__2
    tmp1359 = (z, dz!dW, dz!dx)
    tmp1366 = 2
    dz!dx4 = ungetindex(tmp1346, dz!dz, tmp1366)
    dz!dtmp1364 = tmp1346 .* dz!dx4
    dz!dx5 = tmp1357 * dz!dtmp1364
    dz!dW__3 = dz!dtmp1364 * tmp1348
    z2 = getindex(tmp1346, tmp1366)
    dz!dtmp1362 = tmp1352 .* dz!dx5
    dz!dx6 = tmp1357 * dz!dtmp1362
    dz!dW__4 = dz!dtmp1362 * tmp1354
    dz!dW2 = dz!dW__3 .+ dz!dW__4
    tmp1428 = (z2, dz!dW2, dz!dx6)
    result = (tmp1359, tmp1428)
end

The result is a tuple where each element is itself a tuple with usual diff results for each element (2 in this case) - we can concatenate these results to obtain jacobians.

Interesting findings:

  1. Forward pass (calculating result of the original expression) is the same for all components, so it's successfully eliminated.
  2. Reverse pass is mostly unique for each component.
  3. Number of components should known in advance.

So if forward pass takes m seconds and reverse pass takes n seconds, constructing jacobians this way for 10 components will result in approximately m + 10n seconds (compared to 10m + 10n in case of naive approach).

We may also be able to reuse buffers for variables in reverse pass (not shown here), but I haven't invested much time in memory management strategies yet.

cortner commented 6 years ago

So until this is incorporated and tested Inciuld probably just follow your example and hand-code along something along the same lines?

UPDATE: I've tried to follow your example. Seems great, except that unfortunately XGrad has other limitations which mean I can't use it for my case. Maybe this is not a good place to discuss this. Should I open a separate issue? Do you have a Gitter channel?

dfdx commented 6 years ago

It seems like everybody has moved to Slack, so I can create a channel there. Otherwise feel free to post issues here - I check chats and GitHub issues with the same frequency.

Regarding jacobians, I'll try to come up with something during the week. You can check the progress in jacobian branch.

cortner commented 6 years ago

From my end it seems I won’t be able to use it unless indexing works as well. I’ll open another issue to discuss there.

dfdx commented 6 years ago

I think I've got something working. No tests, but the following simple example seems to give correct answer (latest jacobian branch):

using XGrad
ex = :(y = W * x)
jex = jacobian(ex; W=rand(2, 3), x=rand(3))
cortner commented 6 years ago

thanks - I look forward to testing this!