Add support to dense Jacobian and Hessian in API

frapac commented 3 years ago

Together with @sshin23, we are working on rebasing the interface of MadNLP on NLPModels.jl (see https://github.com/sshin23/MadNLP.jl/pull/69).

In that context, we are wondering whether it would be possible to add new functions in NLPModels' API to support both the evaluations of dense Jacobians and dense Hessians. For us, this is particularly useful when we are working on the GPU (where we want to use as many vectorized operations as possible). A dummy example of such an interface can be found here: https://github.com/frapac/MadNLP.jl/blob/fp/dense_gpu/lib/MadNLPTests/src/MadNLPTests.jl#L245-L255 (we need to implement hess_dense! and jac_dense! in MadNLP right now, but we think that this is not ideal).

An ideal interface for us would be:

function hess_dense!(
  nlp::AbstractNLPModel,
  x::AbstractVector,
  y::AbstractVector
  vals::AbstractMatrix;
  obj_weight::Real = one(eltype(x)),
)

function jac_dense!(
  nlp::AbstractNLPModel,
  x::AbstractVector,
  vals::AbstractMatrix,
)

But there might be some caveats in NLPModels.jl that we are not aware of.

CC @amontoison

abelsiqueira commented 3 years ago

Hi @frapac, we need to understand your use case a little bit better. When do you get a dense matrix? I think that jac only returns a dense matrix for ADNLPModels. Is this your interest, or are your creating dense matrices somewhere else?

frapac commented 3 years ago

At the moment our use case is the following. We have a dense KKT system

[ H    J']
[ J    0 ]

with the Jacobian J and the Hessian H being dense matrices. Our code is computing the two matrices directly on the GPU. To avoid expensive data transfer between the host and device, we have modified MadNLP to instantiate on the GPU the matrices H and J used inside MadNLP . So each time we have new trial point x, we need to update the values inside H and J by using the NLPModels callbacks (and here we don't want to use a COO matrix as an intermediate expression). Our code is directly calling:

jac_dense!(nlp, x, J) 
hess_dense!(nlp, x, y, H; obj_weight=sigma)

to update J and H (by doing so we don't have to allocate any new matrix on the GPUs). Then, MadNLP can compute the descent direction by factorizing the dense KKT system on the GPU.

abelsiqueira commented 3 years ago

But what happens if my problem is from CUTEst, where jac and hess are sparse?

frapac commented 3 years ago

I think we are not targeting the resolution of CUTest problem on the GPU right now (and for these, the good ol' jac_coord! and hess_coord! are already doing a good job). We are mostly interested in solving our custom problems (equivalent to some PDE-constrained optimization), by interfacing them with MadNLP by using NLPModels.jl (as this is now the default interface of MadNLP).

abelsiqueira commented 3 years ago

Ah cool, you got your own NLPModels problem. We were discussing your issue about an hour ago and wondering how we can do that without breaking the sparse models, and how to implement that for the model you're using. Since you have a personal model, we only have to worry about the first part.

If we only define the new functions but don't implement them for any of our models, does that break your package? If so, how?

frapac commented 3 years ago

I apologize if that was unclear! I should have stated first that we are using our personal model :)

If we only define the new functions but don't implement them for any of our models, does that break your package? If so, how?

That would be perfect for us. If the templates are available inside NLPModels, then we just have to overload them on our side to get everything working.

abelsiqueira commented 3 years ago

@dpo, do you oppose this idea? Something like this:

"""
    Jx = jac!(nlp, x, Jx)

Computes the Jacobian `Jx` at `x` in place.
"""
function jac! end

"""
    Hx = hess!(nlp, x, Hx)
    Hx = hess!(nlp, x, y, Hx)

Computes the Hessian `Hx` at `x` or `(x,y)`, depending on whether the
problem is constrained or not, in place.
"""
function hess! end

@frapac, can you make the PR? We use the naming jac! and hess! for internal consistency, is that ok?

frapac commented 3 years ago

I would be more happy to make the PR!

dpo commented 3 years ago

I was wondering why jac_coord!() and hess_coord!() are insufficient. If your Jacobian and Hessian are stored in dense matrices, isn't it ok to pass views of those matrices (that "vectorize" them) as arguments so they get filled in? You just have to set jac_structure() and hess_structure() to parse all tuples (i, j).

sshin23 commented 3 years ago

@dpo Thanks for the comment! Of course one can always write a parser that converts a sparse NLPModel into a dense NLPModel. But we're particularly interested in models that are dense from the beginning. For those models, it is preferred to implement Jacobian/Hessian callbacks that directly take AbstractMatrix inputs.

This can still be done with the current jac_coord!() and hess_coord!() by extending jac_coord!() by:

function jac_coord!(nlp, x, vals::AbstractVector)
# ...
end

function jac_coord!(nlp, x, vals::AbstractMatrix)
# ..
end

So it is open for discussion as to whether to use the current jac_coord! API and extend it to matrix inputs or to create a separate API for dense callbacks as suggested in #378.

I advocate creating a new API because

the expected behavior of jac_coord! and jac! are quite different
the function name jac_coord! suggests that it should evaluate the coordinates of the Jacobian, which is the nzval in COO format.

dpo commented 3 years ago

@sshin23 Sorry for the delay. I guess I'm just wondering in what way

jac!(args...) = jac_coord!(args...)
hess!(args...) = hess_coord!(args...)

are insufficient if a problem defines jac_structure() and hess_structure() to return all indices. For instance, that's what's done here for models based on ForwardDiff, which doesn't support sparse Jacobians or Hessians. It seems feasible to simply call jac!() or hess!() above with view(J, :) or view(H, :) as argument, where J and H are dense matrices. Such views allocate, but only a tiny amount.

There may be a good reason why the above doesn't work for you, but I'm not seeing it clearly yet.

sshin23 commented 3 years ago

@dpo Indeed jac_coord! is sufficient if we're always working with ADNLPModels.

But solvers, which should work with any eligible NLPModel given by the user, cannot assume that NLPModel given by the user has the desired dense jacobian structure. So, it always needs to call jacobian_structure! first to obtain sparsity, construct an array of tuples in Vector{Tuple{Int,Int}} form, then construct a view of the dense Jacobian matrix, which can be passed to jac_coord!. So the solvers are forced to introduce lots of overhead.

On the other hand, when we have a dedicated API for dense Jacobian, we don't need to introduce such overhead. The same applies to Hessian callbacks.

dpo commented 3 years ago

We implemented jac_structure() for ADNLPModels because we sometimes pass them to sparse solvers (which is admittedly inefficient).

But in your case, I still don't see where the overhead is. If you know your problem is dense and you know your solver will exploit that, there's no need to call (or even implement) jac_structure(). We can simply alias jac! = jac_coord! and you implement jac_coord! so it fills in the entire matrix column by column.

sshin23 commented 3 years ago

If you know your problem is dense and you know your solver will exploit that, there's no need to call (or even implement)

The issue is that the solver doesn't know if the problem is in the correct dense format a priori. This is because we want our solver to work any eligible NLPModel that user may provide. In order to exploit the density, the solver is forced to call jac_coord! first and check if it is the enumeration of the indices of a dense matrix. On the other thand, if we have dedicated APIs for dense callbacks, we don't need to check this.

abelsiqueira commented 3 years ago

I think what @dpo means is that since it's a Julia matrix, its linearized form is always by columns:

julia> A
2×3 Matrix{Int64}:
 1  3  5
 2  4  6

julia> A[:]
6-element Vector{Int64}:
 1
 2
 3
 4
 5
 6

So you can call jac_coord!(nlp, x, @view A[:]), and define jac_coord!.

Does that make sense?

sshin23 commented 3 years ago

@abelsiqueira yes, that part I agree. But this assumes that we have full control of modeling side and solver side. I'm talking about the situation where the solver cannot expect that jac_structure! always returns I=[1,2,1,2,1,2] and J=[1,1,2,2,3,3]. E.g., it may return I=[1,1,1,2,2,2] and J=[1,2,3,1,2,3], if the modeler decides to do so for whatever reason. In this situation, the solver needs to check if I=[1,2,1,2,1,2] and J=[1,1,2,2,3,3].

abelsiqueira commented 3 years ago

But in that case, where is the dense part coming from?

I think I'm still lost on where the information on whether the problem is sparse or dense (and which functions to use accordingly) comes from.

I think I see a situation where this is a problem:

Let's say I decided to write down an EvilDenseModel, which is specifically trying to break what you wrote above. You could know this is a dense problem from the nlp.meta.nnzj, for instance, so let's assume you did that. Then you create the first Jacobian with Jx = jac(nlp, x). Then you need to update the Jacobian and call jac_coord!(nlp, x, @view Jx[:]). My model specifically doesn't work because I wrote it down as a sparse problem internally, deliberately changing the indexes' order.

In order to fix that, you are asking the user to implement jac!, which is not a mandatory method, that changes the matrix directly - so the order doesn't matter. But instead, I could ask the user to use the order correctly, because if his method is dense, he is using the matrices already.

Is your model like this?
Notice my assumption on how to check for dense problems. Is this what you're doing?

Sorry for being extra picky. Dense problems were not on our radar for most of the development, so accommodating these changes are a bit about changing the (unwritten) philosophy as well.

Also sorry for the delay.

sshin23 commented 3 years ago

No worries. We completely understand your trying to make NLPModels minimal, and that is the reason why we like NLPModels.jl 😉 We'll discuss this further internally and think about other options.

tmigot commented 6 months ago

Superseded by #458

JuliaSmoothOptimizers / NLPModels.jl

Add support to dense Jacobian and Hessian in API #376