Closed ElOceanografo closed 1 year ago
On discourse @ChrisRackauckas suggested using star-2 coloring, but in some testing I just did it seems like the default GreedyD1Color is usually faster and more compact on Hessians with random sparsity patterns. Thoughts?
Alright, then let's go with that.
What methods for calculating the gradients should there be?
FiniteDiff, ForwardDiff, and Zygote, for now. AbstractDifferenatation.jl should come in a year to greatly simplify that.
Where/how should this go in the repo? Is there analogous Jacobian code I should model it off of?
It would be good to make an allocating and a non-allocating (pre-cached) form (noting though that reverse-mode AD will allocate).
But don't worry about the generality of the code. PR a first version, let someone else follow up with improvements, etc. Open source is an iterative process, and it can be worrisome to think about non-allocating GPU-accelerated etc etc, but take it one step at a time.
Ok, will do. I actually already made a pre-cached version in MarginalLogDensities, so I'll see if I can adapt that.
Continuing on from this Discourse discussion, I've implemented a method for decompressing a Hessian matrix from its coloring:
This can be polished, but I've got a couple of questions before working it up as a PR:
GreedyD1Color
is usually faster and more compact on Hessians with random sparsity patterns. Thoughts?colormethods = [SparseDiffTools.GreedyD1Color(), SparseDiffTools.GreedyStar2Color()] for i in 1:ntest, nx = 100 xis = rand(1:nx, nx) xjs = rand(1:nx, nx) f(x) = dot(x, x) + dot(x[xis], x[xjs]) g(x) = ForwardDiff.gradient(f, x) x = randn(nx) hs = hessian_sparsity(f, x) nnzs[i] = nnz(hs) Hfd = ForwardDiff.hessian(f, x) for j in 1:2 coloring_timings[i, j] = @elapsed col = matrix_colors(hs, colormethods[j]) ncolors[i, j] = maximum(col) sparsehess(f, g, hs, col, x) t1 = time_ns() H = sparsehess(f, g, hs, col, x) t2 = time_ns() decompression_timings[i, j] = t2 - t1 max_errors[i, j] = maximum(Matrix(H) .- Hfd) end end
mnc = mean(ncolors, dims=1); mnc[2] / mnc[1] # 2.3 mct = mean(coloring_timings, dims=1); mct[2] / mct[1] # 4.78 mdt = mean(decompression_timings, dims=1); mdt[2] / mdt[1] # 2.25
using StatsPlots density(log10.(coloring_timings), label=["D1" "Star-2"], xlabel="log10(Coloring time)") density(ncolors, label=["D1" "Star-2"], xlabel="Number of colors") density(log10.(decompression_timings), label=["D1" "Star-2"], xlabel="log10(Decompression time)") density(max_errors, label=["D1" "Star-2"], xlabel="Max. error re. ForwardDiff.hessian")