coin-or / CppAD

A C++ Algorithmic Differentiation Package: Home Page
https://cppad.readthedocs.io
Other
446 stars 94 forks source link

Computing Derivatives of solutions of optimization Problems #157

Closed a-jp closed 1 year ago

a-jp commented 1 year ago

Hi, I'm new to cppad, but not new to ipopt or the concept of AD. In trying to explain my question, I was going to use this example. My question (sorry if title is not quite right), let's assume that the objective also depended on another scalar:

fg[0] = scalar*x1 * x4 * (x1 + x2 + x3) + x3;

I'd like to understand how to obtain the jacobian representing d(xi)/d(scalar), but here the xi is only obtained after minimising via ipopt. Is that possible, does my question make sense? scalar can be any type as needed. So I guess in finite difference terms, I'd compute xi(scalar+eps) via an ipopt call and then xi(scalar) via another ipopt call, then compute the FD Jac as (xi(scalar+eps) - xi(scalar))/(eps). I'm hoping instead to do this by AD...

Many thanks, Andy

bradbell commented 1 year ago

The problem here is that the minimizer w.r.t x will depend on scalar, so if you change the value of scalar, you will have to recompute the minimizer in order to evaluate your objective ?

a-jp commented 1 year ago

Thanks. That's ok. I appreciate it's very expensive but allows me to solve something without the worry on whether I've chosen an eps for finite difference that is overly affecting the answer. Would you be able to show me how to do it?

bradbell commented 1 year ago

I am still not sure I understand our problem. I think that you want to solve the problem

minimize f(x) w.r.t. x
subject to L6 <= x6 <= U6

and then compute the partial of f(x) w.r.t. x6. This will just be the positive (negative) of the Lagrange multiplier for x6 at the optimal solution if L6 (U6) is active at the solution

a-jp commented 1 year ago

I think what I'm trying to do, is the following:

x = guess, x is a vector
minimise Func(x, R) w.r.t to x, here R is a constant scalar for any given minimisation

Once I've solved the minimisation of Func, which is a scalar value, and obtained x, then I need to solve the following using the optimised x:

Given: $$B = \sum{i} x{i}(R) b_{i}(R) $$

Compute:

$$ \frac{\partial}{\partial R} B = \sum{i} x{i}(R) \frac{\partial b{i}(R) }{\partial R} + \sum{i} b{i}(R) \frac{\partial x{i}(R) }{\partial R} $$

I've computed the first partial derivative using cppAD and compared this to a known result and it's exact, code to do so as follows:

        const std::size_t n = 1;
        std::vector<ADNumber> ax(n);
        ax[0] = R; // constant known scalar as noted above

        CppAD::Independent(ax);

        const std::size_t m = 1;
        std::vector<ADNumber> B(m);
        B[0] = 0.0;

        for (std::size_t s = 0; s < size; s++)
        {
            ADNumber b = 0.0;
            Compute_b_term(ax[0], b); // this takes ax[0] as a const&, and b is taken as a non-const& and is assigned inside the function
            // Note! x comes from solution.x (after ipopt call completes) and is simply a std::vector<double>, not an AD type
            B[0] += x[s] * b; // here x is the solution vector coming from ipopt after minimization
        }

        CppAD::ADFun<double> f(ax, B);

        // compute derivative using operation sequence stored in f
        // std::vector<double> jac(m * n); // Jacobian of f (m by n matrix)
        std::vector<double> val(n);        // domain space vector
        val[0] = R;                        // argument value for computing derivative
        const auto &first_partial_derivative = f.Jacobian(val); // Jacobian for operation sequence

I've no idea how to compute the second partial derivative above...could you provide any advice? I assume it requires me to use similar code to the above, but to use Func somehow? I don't know if I can just compute the value of Func one more time with the optimised values of x, outside of the optimiser and record this in AD types, and then use similar code to the above but now with B as the size of x? Bit stuck...

bradbell commented 1 year ago

Suppose we are given the unconstrained poblem

$$ {\rm minimize} ~ f(x, R) \~ {\rm w.r.t} ~ x $$

Define $x(R)$$ as the solution corresponding to $R$, then this function can be defined implicitly by

$$ 0 = \partial_x f[ x(R), R] $$

because the derivative is zero at an unconstrained optimum.

The problem here is computing the derivatives of x(R) w.r.t. R. I suggest you see https://www.seanet.com/~bradbell/newton_step.htm

a-jp commented 1 year ago

Looks like you're missing double-$ at the end of the line?

Many thanks for this, that's going to take some time to digest. I should have added, and omitted because I was unaware of where this was going, that my problem is a constrained optimisation problem (equality and inequality).

Can I ask another question, which is related as it came up setting up the finite difference version of this, which I should add seems to work and produces answers that are reported elsewhere for my problem. Using my nomenclature above, I am doing:

$$ \frac{\partial}{\partial R} B \approx \frac{ Func(x, R+eps) - Func(x, R) }{eps} $$

To get this to work using as a starting point this example and since I need to call the optimiser twice per derivative that I need, and I need those derivatives at a lot of Rs, I moved a lot of the code as possible to a one-off set up function, and then call the minimum amount when I need the derivative. Ideally no construction and destruction of the big objects, was my aim. I found two thing out:

Many thanks,

bradbell commented 1 year ago
  • To vary R which I must do, not only for perturbing by eps (~1.0e-08), and for different R's, I need to ensure retape = true because within the functions that get called in the optimiser I do things like if R < R_1. I believe the way to fix this is to use CondExpLt, which I'm working on to fix. Am I correct?

Yes

bradbell commented 1 year ago
  • This second one confused me. I do initialise x_i to a guess, as required. I then as per the above reasoning, do a lot of one-off set up. Outside of computing derivatives and just normal optimisation calls, I've found previously with my python implementation using ipopt, that I do sometimes need to make a better initial guess. There is also a use case for a better initial guess after setup in the above partial derivative, since the result of the first optimiser call can be used as the initial guess for the second call, to reduce the number of iteration in ipopt. I've found, that once the Ipopt::TNLP object is constructed, no further changes to the initial values of x_i can be made use of. The values used for x_i are the values used on construction of Ipopt::TNLP. Have I figured that out correctly? Is that integral to how it works, or something that can be changed?

Are you using ipopt_solve for your optimization ? https://coin-or.github.io/CppAD/doc/ipopt_solve.htm

bradbell commented 1 year ago

I am going to move this issue to a discussion.