Open cortner opened 6 years ago
Yes, I thought about it, and even have a function to merge several expression trees, but I haven't experimented with it yet. Give me a couple of days to play around with it.
Here's a manual code for calculating "something like jacobian":
# example vector-valued expression
base_ex = quote
x1 = exp.(W*x)
x2 = exp.(W*x)
x3 = exp.(W*x2)
end
# adding `x3[1]` and `x3[2]` to run normal differentiation pipeline
ex1 = copy(base_ex) |> sanitize; push!(ex1.args, :(z = x3[1]))
ex2 = copy(base_ex) |> sanitize; push!(ex2.args, :(z = x3[2]))
# finding derivative expressions for these 2 components
W = rand(3,3); x = rand(3)
inputs = [:W => W, :x => x]
ctx = Dict(:codegen => VectorCodeGen())
dex1 = xdiff(ex1; ctx=ctx, inputs...)
dex2 = xdiff(ex2; ctx=ctx, inputs...)
# joining diff expressions for the 2 components
dex = copy(dex1)
for subex in dex2.args
push!(dex.args, subex)
end
# auxiliary variable to return all the results
push!(dex.args, :(result = ($(dex1.args[end].args[1]), $(dex2.args[end].args[2]))))
# create graph from the joint diff expression and eliminate common subexpressions
g = ExGraph(dex; inputs...)
evaluate!(g)
g = eliminate_common(g)
to_expr(g)
which generates:
quote # /home/azhabinski/.julia/v0.6/Espresso/src/exgraph.jl, line 99:
tmp1342 = 1
tmp1357 = transpose(W)
dz!dz = 1.0
tmp1338 = W * x
tmp1352 = exp.(tmp1338)
tmp1348 = transpose(tmp1352)
tmp1340 = W * tmp1352
tmp1346 = exp.(tmp1340)
dz!dx3 = ungetindex(tmp1346, dz!dz, tmp1342)
dz!dtmp1340 = tmp1346 .* dz!dx3
dz!dx2 = tmp1357 * dz!dtmp1340
dz!dtmp1338 = tmp1352 .* dz!dx2
dz!dx = tmp1357 * dz!dtmp1338
dz!dW__1 = dz!dtmp1340 * tmp1348
z = getindex(tmp1346, tmp1342)
tmp1354 = transpose(x)
dz!dW__2 = dz!dtmp1338 * tmp1354
dz!dW = dz!dW__1 .+ dz!dW__2
tmp1359 = (z, dz!dW, dz!dx)
tmp1366 = 2
dz!dx4 = ungetindex(tmp1346, dz!dz, tmp1366)
dz!dtmp1364 = tmp1346 .* dz!dx4
dz!dx5 = tmp1357 * dz!dtmp1364
dz!dW__3 = dz!dtmp1364 * tmp1348
z2 = getindex(tmp1346, tmp1366)
dz!dtmp1362 = tmp1352 .* dz!dx5
dz!dx6 = tmp1357 * dz!dtmp1362
dz!dW__4 = dz!dtmp1362 * tmp1354
dz!dW2 = dz!dW__3 .+ dz!dW__4
tmp1428 = (z2, dz!dW2, dz!dx6)
result = (tmp1359, tmp1428)
end
The result is a tuple where each element is itself a tuple with usual diff results for each element (2 in this case) - we can concatenate these results to obtain jacobians.
Interesting findings:
So if forward pass takes m
seconds and reverse pass takes n
seconds, constructing jacobians this way for 10 components will result in approximately m + 10n
seconds (compared to 10m + 10n
in case of naive approach).
We may also be able to reuse buffers for variables in reverse pass (not shown here), but I haven't invested much time in memory management strategies yet.
So until this is incorporated and tested Inciuld probably just follow your example and hand-code along something along the same lines?
UPDATE: I've tried to follow your example. Seems great, except that unfortunately XGrad has other limitations which mean I can't use it for my case. Maybe this is not a good place to discuss this. Should I open a separate issue? Do you have a Gitter channel?
It seems like everybody has moved to Slack, so I can create a channel there. Otherwise feel free to post issues here - I check chats and GitHub issues with the same frequency.
Regarding jacobians, I'll try to come up with something during the week. You can check the progress in jacobian
branch.
From my end it seems I won’t be able to use it unless indexing works as well. I’ll open another issue to discuss there.
I think I've got something working. No tests, but the following simple example seems to give correct answer (latest jacobian
branch):
using XGrad
ex = :(y = W * x)
jex = jacobian(ex; W=rand(2, 3), x=rand(3))
thanks - I look forward to testing this!
Would you consider extending this package to compute jacobians?
I suppose on some level this can be "hacked" by differentiating the individual components, but if there are a lot of intermediate expressions that go into several components of the final
f(x)
vector, then this could really improve performance.(My current use-case is differentiation of vector-valued multi-variate polynomials in up to 10, later maybe more, variables.)