Open learning-chip opened 2 years ago
I think that the problem is not in Optim.jl, but in NLSolversBase.jl. i can reproduce by:
using NLSolversBase, SparseArrays,Random,LinearAlgebra
n = 8
I_n = spdiagm(ones(n))
Random.seed!(0)
A = sprand(n, n, 0.5) * 0.2 + I_n
B = sprand(n, n, 0.5) * 0.2 + I_n
f(P) = norm(A * P - I_n)
FD = OnceDifferentiable(f, B , autodiff = :finite)
AD = OnceDifferentiable(f, B , autodiff = :forward)
dB_ad = copy(B)
dB_fd = copy(B)
AD.df(dB_ad,B);dB_ad
FD.df(dB_fd,B);dB_fd
its almost definitively related to FiniteDiff.jl (the finite difference backend). but i need to confirm
a reproducer without Optim.jl
using FiniteDiff, ForwardDiff, SparseArrays,Random,LinearAlgebra
n = 8
I_n = spdiagm(ones(n))
Random.seed!(0)
A = sprand(n, n, 0.5) * 0.2 + I_n
B = sprand(n, n, 0.5) * 0.2 + I_n
f(P) = norm(A * P - I_n)
FiniteDiff.finite_difference_gradient(f,B)
ForwardDiff.gradient(f,B)
NLSolversBase (and Optim) pins the version of FiniteDiff to 2.0, but it's not relevant (the error occurs on 2.8.1)
Interesting. I think you'd have to use sparsedifftools.jl instead of finitediff.jl to preserve sparsity patterns. @ChrisRackauckas ?
FiniteDiff.jl has the same sparsity API as SparseDiffTools.jl. Just pass the color vector and the sparse Jacobian. SparseDiffTools has the AD version of sparse differentiation, along with the graph algorithms for computing the color vectors, while FiniteDiff.jl has a compatible finite difference Jacobian implementation.
Reading this more, @longemen3000 did not pass the sparsity
to FiniteDiff, and so it did not know that the Jacobian it was calculating was supposed to be sparse. Having a sparse input does not necessarily mean a sparse output (example: sparse initial condition to an ODE or neural network will become dense), and so that is not sufficient information for any such differentiation code to determine what the sparsity pattern the derivative would have. And we haven't implemented sparsity on gradients since that's a rather niche case as at most it would be a sparse vector, compared to sparse Jacobians or Hessians which would be matrices.
When optimizing a function with
SparseMatrixCSC
input type, the output ("minimizer") is still a sparse matrix type, but with inconsistent sparse patterns depending on theautodiff
option:autodiff = :finite
(default), the output is aSparseMatrixCSC
type but with dense patterns -- mathematically same as dense matrix, no zero entries.autodiff = :forward
, the output has the same sparsity pattern as the original input -- zero entries are kept zero. This seems the most reasonable behavior.minimizer()
returns a flatten 1-D array of matrix values, notSparseMatrixCSC
type. The gradient computation itself should be correct (ref https://github.com/FluxML/Zygote.jl/issues/163#issuecomment-987535974).Should such inconsistent behaviors be considered "expected" or "buggy"?
autodiff = :finite
should also keep sparsity, and shouldn't turn sparse into dense.To reproduce:
As for the result,
f(B_opt_ad)
andf(B_opt_flux)
are finite positive values, because of the constrained sparsity patterns.f(B_opt_fd)
is zero within numerical precision, because all entries are allowed to change.B_opt_fd
is essentiallyinv(Matrix(A))
, i.e. dense inverse of sparse matrix.Package Version