Symmetric coloring for efficient sparse hessian computation

amontoison commented 3 days ago

In ADNLPModels.jl, we currently use column coloring to determine the seeds required to compute compressed Jacobians and Hessians.

Column coloring doesn’t exploit the symmetry of the Hessian and could be replaced with symmetric coloring. With this new coloring, I expect to use fewer seeds to determine all nonzeros of the Hessians. Since Ipopt and KNITRO only require the Hessian of the Lagrangian, this feature is valuable to add to ADNLPModels.jl for solving optimization problems modeled by this package.

The drawback of symmetric coloring is that it's harder to know if, for a given coefficient H[i,j], we need to use the color associated with column j or i. Depending on this, H[i,j] will be stored in row i or j of the vector that represents the "compressed columns" associated with the color c[j] or c[i], where c is the vector that assigns a color to each column.

From what I understand, and thanks to Guillaume Dalle, we can use the property associated with star coloring, which is the method used in symmetric coloring, to perform this decompression step (see https://github.com/gdalle/SparseMatrixColorings.jl/issues/22).

I have opened PR #257 to perform more efficient column decompression in sparse Jacobians and Hessians by preallocating the index of the nnz of the sparse matrix computed by a seed. This will make it easier to use symmetric coloring/decompression in the Hessian later.

I propose the following code for symmetric coloring, but I would like some comments/suggestions before we use it.

@PierreMartinon @ocots @jbcaillau @gergaud @tmigot @gdalle

using SparseMatrixColorings

# c = symmetric_coloring(H, algo)
#
# Use algorithm `algo` to construct a symmetrically structurally orthogonal
# partition of the columns (or rows) of the symmetric matrix `H`.
# The result is a coloring vector `c` of length size(H, 1) == size(H, 2)
# such that for every non-zero coefficient H[i,j], at least one of
# the following conditions holds:
#
#  • column j is the only column of its color c[j] with a non-zero coefficient in row i;
#  • column i is the only column of its color c[i] with a non-zero coefficient in row j.
#
# algo = GreedyColoringAlgorithm()
# c = symmetric_coloring(H, algo)
# ncolors = maximum(colors)

function nnz_colors(trilH, colors, ncolors)
  # We want to determine the coefficients in `trilH` that will be computed by a given color.
  # Because we exploit the symmetry, we also need to store the row index for a given coefficient
  # in the "compressed column".
  dcolors = Dict(i => Tuple{Int,Int}[] for i=1:ncolors)

  for j in 1:n
    for k in trilH.colptr[j]:trilH.colptr[j+1]-1
      i = trilH.rowval[k]

      # Should we use c[i] or c[j] for this coefficient H[i,j]?
      ci = colors[i]
      cj = colors[j]

      if i == j
        # H[i,j] is a coefficient of the diagonal
        push!(dcolors[ci], (j,k))
      else # i > j
        if is_only_color_in_row(H, i, j, n, colors, ci)
          # column i is the only column of its color c[i] with a non-zero coefficient in row j
          push!(dcolors[ci], (j,k))
        else
          # column j is the only column of its color c[j] with a non-zero coefficient in row i
          # it is ensured by the star coloring used in `symmetric_coloring`.
          push!(dcolors[cj], (i,k))
        end
      end
    end
  end

  return dcolors
end

function is_only_color_in_row(H, i, j, n, colors, ci)
  column = 1
  flag = true
  while flag && column ≤ n
    # We want to check that all columns (except the i-th one) with color
    # ci don’t have a non-zero coefficient in row j
    if (column != i) && (colors[column] == ci)
      k = H.colptr[column]
      while (k ≤ H.colptr[column+1]-1) && flag
        row = H.rowval[k]
        if row == j
          # We found a coefficient at row j in a column of color ci
          flag = false
        else # row != j
          k += 1 
        end
      end
    end
    column += 1
  end
  return flag
end

n = 10
H = sprand(n, n, 0.0)
H = H + I
for p in [1, n-1, n]
  H[p,:] .= 1.0
  H[:,p] .= 1.0
end
# julia> H
# 10×10 SparseMatrixCSC{Float64, Int64} with 58 stored entries:
# 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
# 1.0  1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   1.0  1.0
# 1.0   ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅   1.0  1.0
# 1.0   ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅   1.0  1.0
# 1.0   ⋅    ⋅    ⋅   1.0   ⋅    ⋅    ⋅   1.0  1.0
# 1.0   ⋅    ⋅    ⋅    ⋅   1.0   ⋅    ⋅   1.0  1.0
# 1.0   ⋅    ⋅    ⋅    ⋅    ⋅   1.0   ⋅   1.0  1.0
# 1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   1.0  1.0  1.0
# 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
# 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

trilH = tril(H)
# julia> trilH = tril(H)
# 10×10 SparseMatrixCSC{Float64, Int64} with 34 stored entries:
#  1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0  1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0   ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0   ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0   ⋅    ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0   ⋅    ⋅    ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅ 
#  1.0   ⋅    ⋅    ⋅    ⋅    ⋅   1.0   ⋅    ⋅    ⋅ 
#  1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   1.0   ⋅    ⋅ 
#  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0   ⋅ 
#  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

colors = symmetric_coloring(H, algo)
# julia> colors
# 10-element Vector{Int64}:
#  1
#  2
#  2
#  2
#  2
#  2
#  2
#  2
#  3
#  4

ncolors = maximum(colors)
# julia> ncolors
# 4

dcolors = nnz_colors(trilH, colors, ncolors)
# Dict{Int64, Vector{Tuple{Int64, Int64}}} with 4 entries:
#   4 => [(1, 10), (2, 13), (3, 16), (4, 19), (5, 22), (6, 25), (7, 28), (8, 31), (9, 33), (10, 34)]
#   2 => [(2, 11), (3, 14), (4, 17), (5, 20), (6, 23), (7, 26), (8, 29)]
#   3 => [(1, 9), (2, 12), (3, 15), (4, 18), (5, 21), (6, 24), (7, 27), (8, 30), (9, 32)]
#   1 => [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8)]

n = 10
H2 = sprand(n, n, 0.0)
for i = 1:n
  H2[i,n-i+1] = 1.0
end
# julia> H2
# 10×10 SparseMatrixCSC{Float64, Int64} with 10 stored entries:
#  ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   1.0
#  ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   1.0   ⋅ 
#  ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   1.0   ⋅    ⋅ 
#  ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   1.0   ⋅    ⋅    ⋅ 
#  ⋅    ⋅    ⋅    ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅ 
#  ⋅    ⋅    ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅ 
#  ⋅    ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
# 1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 

trilH2 = tril(H2)
# julia> trilH2 = tril(H2)
# 10×10 SparseMatrixCSC{Float64, Int64} with 5 stored entries:
#   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#   ⋅    ⋅    ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅ 
#   ⋅    ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#   ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#   ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 

colors2 = symmetric_coloring(H2, algo)
# julia> colors2 = symmetric_coloring(H2, algo)
# 10-element Vector{Int64}:
#  1
#  1
#  1
#  1
#  1
#  2
#  2
#  2
#  2
#  2

ncolors2 = maximum(colors2)
# julia> ncolors2
# 2

dcolors2 = nnz_colors(trilH2, colors2, ncolors2)
# Dict{Int64, Vector{Tuple{Int64, Int64}}} with 2 entries:
#   2 => []
#   1 => [(10, 1), (9, 2), (8, 3), (7, 4), (6, 5)]

# We need to be careful in our AD tool to not use all colors,
# dcolors2[2] = [] and we don't need the color 2 here!

n = 10
H3 = sprand(n, n, 0.0)
for i = 1:n
  H3[i,n-i+1] = 1.0
end
for p in [1, n]
  H3[p,:] .= 1.0
  H3[:,p] .= 1.0
end
# julia> H3
# 10×10 SparseMatrixCSC{Float64, Int64} with 44 stored entries:
# 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
# 1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   1.0  1.0
# 1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   1.0   ⋅   1.0
# 1.0   ⋅    ⋅    ⋅    ⋅    ⋅   1.0   ⋅    ⋅   1.0
# 1.0   ⋅    ⋅    ⋅    ⋅   1.0   ⋅    ⋅    ⋅   1.0
# 1.0   ⋅    ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅   1.0
# 1.0   ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅   1.0
# 1.0   ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   1.0
# 1.0  1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   1.0
# 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

trilH3 = tril(H3)
# julia> trilH3
# 10×10 SparseMatrixCSC{Float64, Int64} with 23 stored entries:
#  1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0   ⋅    ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0   ⋅    ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0   ⋅   1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0  1.0   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
#  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

colors3 = symmetric_coloring(H3, algo)
# julia> colors3 = symmetric_coloring(H3, algo)
# 10-element Vector{Int64}:
#  1
#  2
#  2
#  2
#  2
#  3
#  3
#  3
#  3
#  4

ncolors3 = maximum(colors3)
# julia> ncolors3 = maximum(colors3)
# 4

dcolors3 = nnz_colors(trilH3, colors3, ncolors3)
# julia> dcolors3 = nnz_colors(trilH3, colors3, ncolors3)
# Dict{Int64, Vector{Tuple{Int64, Int64}}} with 4 entries:
#   4 => [(1, 10), (2, 12), (3, 14), (4, 16), (5, 18), (6, 19), (7, 20), (8, 21), (9, 22), (10, 23)]
#   2 => [(8, 13), (7, 15), (6, 17)]
#   3 => [(2, 11)]
#   1 => [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9)]

gdalle commented 3 days ago

Indeed, the decompression step after a symmetric/star coloring is a bit more tricky, because H[i, j] can be read from the group of either column i or column j. The "right" algorithms are given in the following paper: https://pubsonline.informs.org/doi/abs/10.1287/ijoc.1080.0286. In my opinion, they belong in the library SparseMatrixColorings.jl.

At the moment, only DirectRecover1 is implemented, but ideally we want DirectRecover2 because star coloring creates a byproduct that helps with decompression (the "set of 2-colored stars"). The only obstacle for that is to change the API of ADTypes.jl so that ADTypes.symmetric_coloring is allowed to return something else in addition to the vector of colors.

jbcaillau commented 2 days ago

I have opened PR https://github.com/JuliaSmoothOptimizers/ADNLPModels.jl/pull/257 to perform more efficient column decompression in sparse Jacobians and Hessians by preallocating the index of the nnz of the sparse matrix computed by a seed. This will make it easier to use symmetric coloring/decompression in the Hessian later.

thanks @amontoison for the update. can't be of much help on this. have you considered exchanging on this with @abuttari or JY L'Excellent?

JuliaSmoothOptimizers / ADNLPModels.jl

Symmetric coloring for efficient sparse hessian computation #258