Closed ahwillia closed 8 years ago
Hi, sorry for the late reply. I made some slight changes to your code, and I'm using the latest version of ForwardDiff w/Julia v0.5.
The main changes I made were to fix two common Julia "performance gotchas":
A
as a global. Instead access A
through model
. PCA
are concretely typed.With these changes, the time for g1
on my machine is ~1.7 seconds, and the time for g2
is a whopping ~22.8 seconds. Once #138 lands, you shouldn't have to manually reshape arrays just to pass them as arguments to gradient
or jacobian
, though I'm unsure what the not-reshaped version of this code would look like (so I didn't mess with it).
using ForwardDiff
"""
Encodes a simple PCA model, A ≈ X' * Y
A ∈ ℜ(m × n)
X ∈ ℜ(k × m)
Y ∈ ℜ(k × n)
"""
immutable PCA{T<:Real, K<:Integer}
A::Matrix{T} # data
k::K # number of components
end
function make_gradient_naive(model::PCA)
A,k = model.A,model.k
m,n = size(A)
nX = k * m
# Objective function
function f(θ::Vector)
X = reshape(θ[1:nX],(k,m))
Y = reshape(θ[(nX+1):end],(k,n))
return sum((A - (X'*Y)).^2)
end
return x -> ForwardDiff.gradient(f, x)
end
function make_gradient_tedious(model::PCA)
A,k = model.A,model.k
m,n = size(A)
nX = k * m
# Objective function
function f(θ::Vector)
err = 0.0
for i = 1:m
for j = 1:n
aij = 0.0
for k = 1:k
xki = ((i-1)*k)+k
ykj = nX + ((j-1)*k)+k
aij += θ[xki]*θ[ykj]
end
err += (A[i,j] - aij)^2
end
end
return err
end
return x -> ForwardDiff.gradient(f, x)
end
m,n,k = 200,200,5
A = randn(m,k)*randn(k,n)
g1 = make_gradient_naive(PCA(A,k))
g2 = make_gradient_tedious(PCA(A,k))
θ = randn(m*k + k*n)
@assert sum(abs(g1(θ) - g2(θ))) < 1e-8 # sanity check passes
@time g1(θ) # 1.748855 seconds (4.76 k allocations: 2.000 GB, 11.98% gc time)
@time g2(θ) # 22.838722 seconds (311.92 M allocations: 19.192 GB, 19.93% gc time)
Feel free to reopen if you're still running into problems with this code.
I'm still a newcomer to autodifferentiation world, but I'm interested in trying it out on some matrix and tensor decomposition optimization problems. As shown in the code below, this involves either reshaping the input vector (
ForwardDiff
will only work on vector-valued inputs) or doing some mildly clever indexing.I want to take the gradient of, e.g.
vecnorm(A - X'*Y)^2
, with respect to the matricesX
andY
and a constant matrix of dataA
. Minimizing this objective function is just classic principal components analysis.I've done this by two different methods below, but I'm surprised at how long it takes to evaluate the gradient for even a very small problem. I'm wondering if this is some limitation of autodiff, or if my code is suboptimal in some way that I'm missing / don't understand.