control-toolbox / CTDirect.jl

Direct transcription of an optimal control problem and resolution
http://control-toolbox.org/CTDirect.jl/
MIT License
9 stars 6 forks source link

Pass sparse structure of constraints to NLPModels #183

Open jbcaillau opened 3 months ago

jbcaillau commented 3 months ago

See https://github.com/0Yassine0/COTS.jl/issues/17#issuecomment-2241187824

jbcaillau commented 2 months ago

@amontoison can you please point towards a worked out example defining an NLPModel? According to the API, I guess that the stuff listed here are required (Jacobian / Hessian and associated products...)

jbcaillau commented 2 months ago

See https://github.com/JuliaSmoothOptimizers/ADNLPModels.jl/pull/284

amontoison commented 2 months ago

Do you want a ManualNLPModel or still use an ADNLPModel? -> https://jso.dev/ADNLPModels.jl/previews/PR209/sparsepattern/

jbcaillau commented 2 months ago

Thanks @amontoison for the example. I suppose that building manually the whole thing (= sparse derivatives, using AD only where it’s needed) gives the best result, but what is tour advice?

amontoison commented 2 months ago

@jbcaillau If you directly provide the sparsity pattern of Jacobians / Hessians in an ADNLPModel / ADNLSModel, you will have a speed-up for sure.

If you are building manually the whole thing, the coloring / AD / decompression will be also replaced by your own function jac_coord and hess_coord and it highly depends on your implementation of these fonctions. If you exploit the structure and redondant patterns, you will have an advantage but you also need to find some parallelism. (It's a good application for Tapenade with code to code AD.)

With AD we can evaluate multiple directional derivates in the same time to compute the compressed Jacobian / Hessian.

jbcaillau commented 2 months ago

@jbcaillau If you directly provide the sparsity pattern of Jacobians / Hessians in an ADNLPModel / ADNLSModel, you will have a speed-up for sure.

If your building manually the whole thing, the coloring / AD / decompression will be also replaced by your own function jac_coord and hess_coord and it highly depends on your implementation of these fonctions […]

thanks @amontoison let’s try the first option as a start. looking forward https://github.com/JuliaSmoothOptimizers/ADNLPModels.jl/pull/286

jbcaillau commented 2 months ago

@amontoison actually, there is a worked out example of building an NLPModel by hand in MadNLP docs: https://madnlp.github.io/MadNLP.jl/dev/quickstart/#Specifying-the-derivatives-by-hand:-NLPModels.jl

amontoison commented 2 months ago

I already gave you a tutorial on a ManualNLPModel, is it not what you wanted? https://jso.dev/tutorials/create-a-manual-model/

In the same spirit than the example of MadNLP, we have a few problems defined by hand to test our various AbstractNLPModels in the JSO ecosystem: https://github.com/JuliaSmoothOptimizers/NLPModelsTest.jl/tree/main/src/nlp/problems

jbcaillau commented 2 months ago

@amontoison yes, exactly what we need πŸ‘πŸ½. thanks again.

amontoison commented 2 months ago

We can easily provide the sparsity pattern of the Jacobian / Hessian with the release 0.8.5 of ADNLPModels.jl. I added an example in the documentation: https://jso.dev/ADNLPModels.jl/dev/sparse/

jbcaillau commented 2 months ago

We can easily provide the sparsity pattern of the Jacobian / Hessian with the release 0.8.5 of ADNLPModels.jl. I added an example in the documentation: https://jso.dev/ADNLPModels.jl/dev/sparse/

nice, @amontoison. actually, this seems to be OK for our needs for direct transcription into an nlp. BTW, given that, what would be the benefit to use NLPModels (instead of ADNLPModels)? the expected improvement being on sparsity patterns (that we do know in advance), I guess that ADNLPModels is enough. Am I missing sth?

amontoison commented 2 months ago

@jbcaillau It depends on whether you have more information than what ADNLPModels.jl can exploit. For example, if you have redundant terms in your objective or constraints, your Hessian or Jacobian might contain the same block matrices multiple times (e.g., J1, J2, J3).

J = [
J2 J3 0  0
J1 J2 J3 0
0  J1 J2 J3
0  0  J1 J2
0  0  0  J1
]

In this case, you can optimize computations by only calculating the blocks J1, J2, and J3 once. This approach goes beyond merely knowing the sparsity pattern of J; it also leverages additional information about the relationships between the non-zero elements.

The blocks J1, J2, J3 could also depends on x or a subset of x and in that case you just need to implement function J1(y), J2(y), J3(y) where y is a vector.

jbcaillau commented 2 months ago

πŸ™πŸ½@amontoison perfectly clear, many thanks. having such common blocks is not expected a priori. the next step in the case of direct approach (ocp -> nlp vs. using Pontrjagin then shooting or so) is rather exploiting SIMD features.

PierreMartinon commented 2 months ago

πŸ™πŸ½@amontoison perfectly clear, many thanks. having such common blocks is not expected a priori.

Hi. What about the dynamics and path constraints along the time steps ? I thought they would be this kind of common blocks, did I miss something ?

Currently, the constraints evaluation looks like this (the double arguments and update parts are a specific optimization for the trapeze method and would not be present for another discretization scheme). Not included is the final part with the boundary and variable constraints, but setStateEquation and setPathConstraints would be your J_i ?

    args_i = ArgsAtTimeStep(xu, docp, 0, v)
    args_ip1 = ArgsAtTimeStep(xu, docp, 1, v)

    index = 1 # counter for the constraints
    for i in 0:docp.dim_NLP_steps-1

        # state equation
        index = setStateEquation!(docp, c, index, (args_i, args_ip1))
        # path constraints 
        index = setPathConstraints!(docp, c, index, args_i, v)

        # smart update for next iteration
        if i < docp.dim_NLP_steps-1
            args_i = args_ip1
            args_ip1 = ArgsAtTimeStep(xu, docp, i+2, v)
        end
    end
jbcaillau commented 2 months ago

@PierreMartinon common generators (= functions to evaluate) for every time step, but different evals so different blocks. Maybe a few common things in the particular case of the trapezoidal scheme (see your smart update - nice πŸ‘πŸ½)? I might be missing sth, though.