Hessian decompression implementation

ElOceanografo commented 3 years ago

Continuing on from this Discourse discussion, I've implemented a method for decompressing a Hessian matrix from its coloring:

using SparsityDetection
using SparseDiffTools
using ForwardDiff
using LinearAlgebra
using SparseArrays

function sparsehess(f, g, hs, col, x)
    D = hcat([float.(i .== col) for i in 1:maximum(col)]...)
    δ = 1e-8
    ii, jj, vv = findnz(triu(hs))
    Hc = hcat([(g(x + δ * D[:, j]) - g(x)) / δ for j in 1:maximum(col)]...)
    H = sparse(ii, jj, zeros(length(vv)))
    for (i, j) in zip(ii, jj)
        H[i, j] = Hc[i, col[j]]
    end
    return Symmetric(H)
end

This can be polished, but I've got a couple of questions before working it up as a PR:

Where/how should this go in the repo? Is there analogous Jacobian code I should model it off of?
What methods for calculating the gradients should there be?
On discourse @ChrisRackauckas suggested using star-2 coloring, but in some testing I just did it seems like the default GreedyD1Color is usually faster and more compact on Hessians with random sparsity patterns. Thoughts?
```
ntest = 500
nnzs = zeros(ntest)
coloring_timings = zeros(ntest, 2)
decompression_timings = zeros(ntest, 2)
ncolors = zeros(ntest, 2)
max_errors = zeros(ntest, 2)
```

colormethods = [SparseDiffTools.GreedyD1Color(), SparseDiffTools.GreedyStar2Color()] for i in 1:ntest, nx = 100 xis = rand(1:nx, nx) xjs = rand(1:nx, nx) f(x) = dot(x, x) + dot(x[xis], x[xjs]) g(x) = ForwardDiff.gradient(f, x) x = randn(nx) hs = hessian_sparsity(f, x) nnzs[i] = nnz(hs) Hfd = ForwardDiff.hessian(f, x) for j in 1:2 coloring_timings[i, j] = @elapsed col = matrix_colors(hs, colormethods[j]) ncolors[i, j] = maximum(col) sparsehess(f, g, hs, col, x) t1 = time_ns() H = sparsehess(f, g, hs, col, x) t2 = time_ns() decompression_timings[i, j] = t2 - t1 max_errors[i, j] = maximum(Matrix(H) .- Hfd) end end

mnc = mean(ncolors, dims=1); mnc[2] / mnc[1] # 2.3 mct = mean(coloring_timings, dims=1); mct[2] / mct[1] # 4.78 mdt = mean(decompression_timings, dims=1); mdt[2] / mdt[1] # 2.25

using StatsPlots density(log10.(coloring_timings), label=["D1" "Star-2"], xlabel="log10(Coloring time)") density(ncolors, label=["D1" "Star-2"], xlabel="Number of colors") density(log10.(decompression_timings), label=["D1" "Star-2"], xlabel="log10(Decompression time)") density(max_errors, label=["D1" "Star-2"], xlabel="Max. error re. ForwardDiff.hessian")

ChrisRackauckas commented 3 years ago

On discourse @ChrisRackauckas suggested using star-2 coloring, but in some testing I just did it seems like the default GreedyD1Color is usually faster and more compact on Hessians with random sparsity patterns. Thoughts?

Alright, then let's go with that.

What methods for calculating the gradients should there be?

FiniteDiff, ForwardDiff, and Zygote, for now. AbstractDifferenatation.jl should come in a year to greatly simplify that.

Where/how should this go in the repo? Is there analogous Jacobian code I should model it off of?

https://github.com/JuliaDiff/SparseDiffTools.jl/blob/master/src/differentiation/compute_jacobian_ad.jl

It would be good to make an allocating and a non-allocating (pre-cached) form (noting though that reverse-mode AD will allocate).

ChrisRackauckas commented 3 years ago

But don't worry about the generality of the code. PR a first version, let someone else follow up with improvements, etc. Open source is an iterative process, and it can be worrisome to think about non-allocating GPU-accelerated etc etc, but take it one step at a time.

ElOceanografo commented 3 years ago

Ok, will do. I actually already made a pre-cached version in MarginalLogDensities, so I'll see if I can adapt that.

JuliaDiff / SparseDiffTools.jl

Hessian decompression implementation #142