Efficiency problems -- reshaping vector input

I'm still a newcomer to autodifferentiation world, but I'm interested in trying it out on some matrix and tensor decomposition optimization problems. As shown in the code below, this involves either reshaping the input vector (ForwardDiff will only work on vector-valued inputs) or doing some mildly clever indexing.

I want to take the gradient of, e.g. vecnorm(A - X'*Y)^2, with respect to the matrices X and Y and a constant matrix of data A. Minimizing this objective function is just classic principal components analysis.

I've done this by two different methods below, but I'm surprised at how long it takes to evaluate the gradient for even a very small problem. I'm wondering if this is some limitation of autodiff, or if my code is suboptimal in some way that I'm missing / don't understand.

import ForwardDiff
using Devectorize

"""
Encodes a simple PCA model, A ≈ X' * Y

A ∈ ℜ(m × n)
X ∈ ℜ(k × m)
Y ∈ ℜ(k × n)
"""
type PCA
    A::Matrix{AbstractFloat} # data
    k::Integer # number of components
end

function make_gradient_naive(model::PCA)

    m,n = size(A)
    k = model.k

    nX = k * m

    # Objective function
    function f(θ::Vector)
        X = reshape(θ[1:nX],(k,m))
        Y = reshape(θ[(nX+1):end],(k,n))
        return sum( (A - (X'*Y)).^2 )
    end

    return ForwardDiff.gradient(f)

end

function make_gradient_tedious(model::PCA)

    m,n = size(A)
    nX = model.k * m

    # Objective function
    function f(θ::Vector)
        err = 0.0

        for i = 1:m
            for j = 1:n
                aij = 0.0
                for k = 1:model.k
                    xki = ((i-1)*model.k)+k
                    ykj = nX + ((j-1)*model.k)+k
                    aij += θ[xki]*θ[ykj]
                end
                err += (A[i,j] - aij)^2 
            end
        end

        return err
    end

    return ForwardDiff.gradient(f)

end

m,n,k = 200,200,5

A = randn(m,k)*randn(k,n)

g1 = make_gradient_naive(PCA(A,k))
g2 = make_gradient_tedious(PCA(A,k))

θ = randn(m*k + k*n)
@assert sum(abs(g1(θ) - g2(θ))) < 1e-8 # sanity check passes
@time g1(θ) # approx 5 seconds!
@time g2(θ) # approx 1.7 seconds!

Hi, sorry for the late reply. I made some slight changes to your code, and I'm using the latest version of ForwardDiff w/Julia v0.5.

The main changes I made were to fix two common Julia "performance gotchas":

Don't treat A as a global. Instead access A through model.
Make sure the fields of PCA are concretely typed.

With these changes, the time for g1 on my machine is ~1.7 seconds, and the time for g2 is a whopping ~22.8 seconds. Once #138 lands, you shouldn't have to manually reshape arrays just to pass them as arguments to gradient or jacobian, though I'm unsure what the not-reshaped version of this code would look like (so I didn't mess with it).

using ForwardDiff

"""
Encodes a simple PCA model, A ≈ X' * Y

A ∈ ℜ(m × n)
X ∈ ℜ(k × m)
Y ∈ ℜ(k × n)
"""
immutable PCA{T<:Real, K<:Integer}
    A::Matrix{T} # data
    k::K # number of components
end

function make_gradient_naive(model::PCA)
    A,k = model.A,model.k
    m,n = size(A)
    nX = k * m

    # Objective function
    function f(θ::Vector)
        X = reshape(θ[1:nX],(k,m))
        Y = reshape(θ[(nX+1):end],(k,n))
        return sum((A - (X'*Y)).^2)
    end

    return x -> ForwardDiff.gradient(f, x)
end

function make_gradient_tedious(model::PCA)
    A,k = model.A,model.k
    m,n = size(A)
    nX = k * m

    # Objective function
    function f(θ::Vector)
        err = 0.0
        for i = 1:m
            for j = 1:n
                aij = 0.0
                for k = 1:k
                    xki = ((i-1)*k)+k
                    ykj = nX + ((j-1)*k)+k
                    aij += θ[xki]*θ[ykj]
                end
                err += (A[i,j] - aij)^2 
            end
        end
        return err
    end

    return x -> ForwardDiff.gradient(f, x)
end

m,n,k = 200,200,5
A = randn(m,k)*randn(k,n)

g1 = make_gradient_naive(PCA(A,k))
g2 = make_gradient_tedious(PCA(A,k))

θ = randn(m*k + k*n)
@assert sum(abs(g1(θ) - g2(θ))) < 1e-8 # sanity check passes
@time g1(θ) # 1.748855 seconds (4.76 k allocations: 2.000 GB, 11.98% gc time)
@time g2(θ) # 22.838722 seconds (311.92 M allocations: 19.192 GB, 19.93% gc time)

Feel free to reopen if you're still running into problems with this code.

JuliaDiff / ForwardDiff.jl

Efficiency problems -- reshaping vector input #110