Refactor ALAMO flowsheet to re-use more work from the full-space

Robbybp commented 10 months ago

Re-use variables from full-space model, but deactivate constraints from reformer and mixer
Construct surrogate model using these variables (as inputs and outputs)
Solve with custom ALAMO initialization function

Robbybp commented 10 months ago

@sbugosen Can you work in a different branch? I intend to come back to this to test the refactored code, and would like the branch to be as I left it on Friday. Don't worry about the commits you've pushed so far, but please don't push any further commits to this branch.

Robbybp commented 9 months ago

With these updates, we can run the ALAMO parameter sweep with:

python run_alamo_sweep.py --fname="alamo-sweep-n5.csv" --n1=5 --n2=5

This runs the parameter sweep with five sampled parameter values for each of conversion and pressure, and stores the results in DATA_DIR/alamo-sweep-n5.csv, where DATA_DIR is, by default, ./data. n1 and n2 both default to 8, for the same parameters we have used in the paper results.

We can then validate the results with:

python validate_sweep_results.py data/alamo-sweep-n5.csv

This will write the validation results (feasibility and objective value of each set of inputs) to data/alamo-sweep-n5-validation.csv. If we have a baseline file (say from a full-space parameter sweep), we can generate the fractional objective differences (in the same output file) with

python validate_sweep_results.py data/alamo-sweep-n5.csv --baseline-fpath=data/fullspace-sweep.csv

Here, the full-space parameter sweep needs to contain the same input parameters as the parameter sweep we are validating.

To run the validation for a single set of parameters (e.g. to inspect/debug a failure), we can run:

python validate_inputs.py data/alamo-sweep-n5.csv --row=10

to run the full space simulation with the inputs specified by row 10 of the parameter sweep.

Robbybp commented 9 months ago

When I run the ALAMO solve and validation parameter sweep scripts, I get results that you can see in the attached files. sweep_results_alamo-validation.csv sweep_results_alamo.csv

Note that there are two instances that fail the validation: input indices 53 and 55. @sbugosen I believe you mentioned that you have used code in this branch to reproduce the ALAMO sweep results from the paper. Can you confirm, and send me the results you obtained if so? Also, when you have time, can you run the default ALAMO sweep workflow on this branch and let me know if you also see these two validation failures? The default workflow is:

python run_alamo_sweep.py --fname=alamo-sweep.csv
python validate_sweep_results.py data/alamo-sweep.csv

You can include a baseline file if you have one ready, but I am primarily interested in seeing whether these two simulation failures occur.

Robbybp commented 9 months ago

Inspecting one of the failures, row 55, I see a solve log that looks like this:

This is Ipopt version 3.13.2, running with linear solver ma27.                                                                                                                          [1/9494]

Number of nonzeros in equality constraint Jacobian...:       13
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        5

Total number of variables............................:        6
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        2
                     variables with only upper bounds:        0
Total number of equality constraints.................:        6
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  0.0000000e+00 6.27e+07 0.00e+00  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  0.0000000e+00 1.30e+08 3.63e+00  -1.0 4.56e+03    -  2.75e-01 5.00e-01h  2
   2  0.0000000e+00 4.52e+07 8.34e+00  -1.0 4.61e+07    -  1.51e-01 1.00e+00h  1
   3  0.0000000e+00 2.27e+07 6.63e+00  -1.0 3.41e+07    -  5.54e-01 2.50e-01h  3
   4  0.0000000e+00 7.71e+05 5.04e-01  -1.0 1.36e+07    -  9.25e-01 1.00e+00h  1
   5  0.0000000e+00 3.85e+03 3.73e-03  -1.0 1.64e+05    -  9.95e-01 1.00e+00h  1
   6  0.0000000e+00 5.76e-02 1.13e-07  -1.0 6.57e+02    -  1.00e+00 1.00e+00h  1
   7  0.0000000e+00 4.47e-08 7.09e-14  -2.5 9.65e-03    -  1.00e+00 1.00e+00h  1
   8  0.0000000e+00 2.98e-08 1.54e-21  -3.8 3.82e-08    -  1.00e+00 1.00e+00   0
   9  0.0000000e+00 1.49e-08 5.38e-23  -5.7 1.51e-08    -  1.00e+00 1.00e+00T  0
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10  0.0000000e+00 1.49e-08 7.28e-25  -8.6 2.74e-09    -  1.00e+00 1.00e+00T  0
  11  0.0000000e+00 1.49e-08 8.08e-28  -9.0 2.74e-09    -  1.00e+00 1.00e+00T  0

Number of Iterations....: 11

                                   (scaled)                 (unscaled)
Objective...............:   0.0000000000000000e+00    0.0000000000000000e+00
Dual infeasibility......:   8.0779356694631609e-28    8.0779356694631609e-28
Constraint violation....:   1.4901161193847656e-08    1.4901161193847656e-08
Complementarity.........:   9.0909090909090910e-10    9.0909090909090910e-10
Overall NLP error.......:   1.4901161193847656e-08    1.4901161193847656e-08

Number of objective function evaluations             = 21
Number of objective gradient evaluations             = 12
Number of equality constraint evaluations            = 21
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 12
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 12
Total CPU secs in IPOPT (w/o function evaluations)   =      0.001
Total CPU secs in NLP function evaluations           =      0.001

EXIT: Search Direction is becoming Too Small.

This seems like a bug in either Ipopt or our Ipopt interface, but seems to go away when I use MA57 as the linear solver. I'd also expect this error to go away if we use a different Ipopt version. I'm currently re-running the validation with MA57 to make sure the results look okay.

sbugosen commented 9 months ago

Hi Robert, I just finished cleaning up what I got so far for the relaxation.

I'm just looking at the validation fails, and the main difference I see from what I did some time ago is the pressure range. I use a step change in pressure of 70000, while you are using a step of 71428. This small difference might be causing that discrepancy in indices 53 and 55. I will validate my previous results (with a step of 70000) to see if I get problems for indices 53 and 55.

El dom, 25 feb 2024 a las 12:33, Robert Parker @.***>) escribió:

Inspecting one of the failures, row 55, I see a solve log that looks like this:

This is Ipopt version 3.13.2, running with linear solver ma27. [1/9494] Number of nonzeros in equality constraint Jacobian...: 13Number of nonzeros in inequality constraint Jacobian.: 0Number of nonzeros in Lagrangian Hessian.............: 5 Total number of variables............................: 6 variables with only lower bounds: 0 variables with lower and upper bounds: 2 variables with only upper bounds: 0Total number of equality constraints.................: 6Total number of inequality constraints...............: 0 inequality constraints with only lower bounds: 0 inequality constraints with lower and upper bounds: 0 inequality constraints with only upper bounds: 0 iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls 0 0.0000000e+00 6.27e+07 0.00e+00 -1.0 0.00e+00 - 0.00e+00 0.00e+00 0 1 0.0000000e+00 1.30e+08 3.63e+00 -1.0 4.56e+03 - 2.75e-01 5.00e-01h 2 2 0.0000000e+00 4.52e+07 8.34e+00 -1.0 4.61e+07 - 1.51e-01 1.00e+00h 1 3 0.0000000e+00 2.27e+07 6.63e+00 -1.0 3.41e+07 - 5.54e-01 2.50e-01h 3 4 0.0000000e+00 7.71e+05 5.04e-01 -1.0 1.36e+07 - 9.25e-01 1.00e+00h 1 5 0.0000000e+00 3.85e+03 3.73e-03 -1.0 1.64e+05 - 9.95e-01 1.00e+00h 1 6 0.0000000e+00 5.76e-02 1.13e-07 -1.0 6.57e+02 - 1.00e+00 1.00e+00h 1 7 0.0000000e+00 4.47e-08 7.09e-14 -2.5 9.65e-03 - 1.00e+00 1.00e+00h 1 8 0.0000000e+00 2.98e-08 1.54e-21 -3.8 3.82e-08 - 1.00e+00 1.00e+00 0 9 0.0000000e+00 1.49e-08 5.38e-23 -5.7 1.51e-08 - 1.00e+00 1.00e+00T 0iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls 10 0.0000000e+00 1.49e-08 7.28e-25 -8.6 2.74e-09 - 1.00e+00 1.00e+00T 0 11 0.0000000e+00 1.49e-08 8.08e-28 -9.0 2.74e-09 - 1.00e+00 1.00e+00T 0 Number of Iterations....: 11 (scaled) (unscaled)Objective...............: 0.0000000000000000e+00 0.0000000000000000e+00Dual infeasibility......: 8.0779356694631609e-28 8.0779356694631609e-28Constraint violation....: 1.4901161193847656e-08 1.4901161193847656e-08Complementarity.........: 9.0909090909090910e-10 9.0909090909090910e-10Overall NLP error.......: 1.4901161193847656e-08 1.4901161193847656e-08

Number of objective function evaluations = 21Number of objective gradient evaluations = 12Number of equality constraint evaluations = 21Number of inequality constraint evaluations = 0Number of equality constraint Jacobian evaluations = 12Number of inequality constraint Jacobian evaluations = 0Number of Lagrangian Hessian evaluations = 12Total CPU secs in IPOPT (w/o function evaluations) = 0.001Total CPU secs in NLP function evaluations = 0.001 EXIT: Search Direction is becoming Too Small.

This seems like a bug in either Ipopt or our Ipopt interface, but seems to go away when I use MA57 as the linear solver. I'd also expect this error to go away if we use a different Ipopt version. I'm currently re-running the validation with MA57 to make sure the results look okay.

— Reply to this email directly, view it on GitHub https://github.com/Robbybp/surrogate-vs-implicit/pull/13#issuecomment-1963038326, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2YFNXUQMYCPUL6VQJJOAEDYVOGXPAVCNFSM6AAAAABCM43N4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRTGAZTQMZSGY . You are receiving this because you were mentioned.Message ID: @.***>

sbugosen commented 9 months ago

Find attached the results you were inquiring about. The main difference is our step size in pressure.

alamo_experiment.csv alamo_validation.csv

Robbybp commented 9 months ago

I use a step change in pressure of 70000, while you are using a step of 71428. This small difference might be causing that discrepancy in indices 53 and 55

That makes sense! Thanks for pointing this out. I just chose an even spacing between the upper and lower bounds I saw in your parameter sweep files. I don't have a strong preference on exactly which pressure values we use, as long as the results we are getting are consistent and reproducible.

Robbybp commented 9 months ago

I'm currently re-running the validation with MA57 to make sure the results look okay.

3 validations fail when I use MA57. I'm currently re-running with Ipopt 3.14 (via CyIpopt) and MA27. Then I will try running with the original pressure values.

Robbybp commented 9 months ago

Developing these analyses is a bit cumbersome because solve_strongly_connected_components takes so long. Here's a breakdown of where it's spending the most time:

Identifier      ncalls   cumtime   percall      %
-------------------------------------------------
root                 1    44.159    44.159  100.0
     --------------------------------------------
     block         547    12.353     0.023   28.0
     calc-var      530    23.566     0.044   53.4
     igraph          1     7.675     7.675   17.4
     solve          16     0.558     0.035    1.3
     other         n/a     0.006       n/a    0.0
     ============================================
=================================================

"block" is time spent constructing temporary blocks for the solver, "calc-var" is time spent in calculate_variable_from_constraint, "igraph" is time spent constructing the IncidenceGraphInterface, and "solve" is time spent solving with Ipopt (including the nl file write).

I'm pretty sure this can all be done faster, as solving the full problem after solve_strongly_connected_components only takes 0.36 s.

Robbybp commented 9 months ago

With some very simple improvements (switching igraph to use IncidenceMethod.ampl_repn and not re-constructing the graph when block triangularizing), we can make a dent:

Identifier                        ncalls   cumtime   percall      %
-------------------------------------------------------------------
root                                   1    32.212    32.212  100.0
     --------------------------------------------------------------
     calc-var                        530    23.002     0.043   71.4
     generate-scc                    547     8.333     0.015   25.9
                 --------------------------------------------------
                 block-triang          1     0.034     0.034    0.4
                 generate-block      547     8.297     0.015   99.6
                 other               n/a     0.002       n/a    0.0
                 ==================================================
     igraph                            1     0.343     0.343    1.1
     solve                            16     0.528     0.033    1.6
     other                           n/a     0.006       n/a    0.0
     ==============================================================
===================================================================

Here, generate-scc is block above. We can probably improve the calc-var overhead by switching to use an ASL-based solver even for the 1x1 blocks.

By not creating blocks for the 1x1 solves, we can reduce a lot of the block-generation overhead:

Identifier                           ncalls   cumtime   percall      %
----------------------------------------------------------------------
root                                      1    26.318    26.318  100.0
     -----------------------------------------------------------------
     block-triang                         1     0.033     0.033    0.1
     calc-var                           530    22.861     0.043   86.9
     igraph                               1     0.344     0.344    1.3
     solve                               16     0.511     0.032    1.9
     subsystem-blocks                     1     2.565     2.565    9.7
                     -------------------------------------------------
                     block               16     0.000     0.000    0.0
                     external-fcns       16     1.928     0.120   75.2
                     identify-vars       16     0.635     0.040   24.8
                     reference           32     0.001     0.000    0.1
                     other              n/a     0.000       n/a    0.0
                     =================================================
     other                              n/a     0.005       n/a    0.0
     =================================================================
======================================================================

Here, subsystem-blocks is roughly equivalent to generate-block above.

With a quick refactor of the external function identification visitor to not repeat work for multiple instances of the same named expression, we can improve performance a bit:

Identifier                           ncalls   cumtime   percall      %
----------------------------------------------------------------------
root                                      1    24.738    24.738  100.0
     -----------------------------------------------------------------
     block-triang                         1     0.033     0.033    0.1
     calc-var                           530    23.144     0.044   93.6
     igraph                               1     0.342     0.342    1.4
     solve                               16     0.516     0.032    2.1
     subsystem-blocks                     1     0.700     0.700    2.8
                     -------------------------------------------------
                     block               16     0.000     0.000    0.1
                     external-fcns       16     0.068     0.004    9.8
                     identify-vars       16     0.629     0.039   89.9
                     reference           32     0.002     0.000    0.2
                     other              n/a     0.000       n/a    0.0
                     =================================================
     other                              n/a     0.005       n/a    0.0
     =================================================================
======================================================================

Now it's just calculate_variable_from_constraint. I think that, to avoid this bottleneck, we need to implement a BT solver that only uses a single PyomoNLP (a single nl file). My experience is that this is also not performant, as it leads to quadratic scaling when evaluating constraints (and Jacobians) in a loop. I will try to implement this and test it when I have some time.

These performance improvements are in my scc-performance branch of Pyomo.

Multiple-NLP implementation

Instead of calculate_variable_from_constraint, use SciPy Newton solvers, which use ASL for function/derivative evaluation.

Use the SccImplicitFunctionSolver, which constructs an NLP for each diagonal block of the block triangularization

Identifier                              ncalls   cumtime   percall      %
-------------------------------------------------------------------------
root                                         1     8.710     8.710  100.0
 --------------------------------------------------------------------
 __init__                                1     8.262     8.262   94.8
               ------------------------------------------------------
               NlpSolver                 1     0.016     0.016    0.2
               PyomoNLP                  1     1.072     1.072   13.0
               partition                 1     0.389     0.389    4.7
               other                   n/a     6.784       n/a   82.1
               ======================================================
 igraph                                  1     0.339     0.339    3.9
 set_parameters                          1     0.088     0.088    1.0
               ------------------------------------------------------
               solve                   546     0.085     0.000   96.4
                        ---------------------------------------------
                        solve_nlp      546     0.084     0.000   99.0
                        other          n/a     0.001       n/a    1.0
                        =============================================
               other                   n/a     0.003       n/a    3.6
               ======================================================
 other                                 n/a     0.021       n/a    0.2
 ====================================================================
=========================================================================

This uses a completely different implementation of the BT solver, so the timing categories are not really comparable to those above. However, this performance is pretty good. We're bottlenecked by initialization of data structures, which could be eliminated if we could just use a single NLP. I think this would introduce a new bottleneck with our current ASL interface, but it is worth testing.

sbugosen commented 9 months ago

I ran the ALAMO sweep with Ipopt 3.14.14 ma27, and the validation with Ipopt 3.13.2 ma27. I got three failed instances (index 5,22,55), although the objective provided by these 3 failed instances is very close to the real objective. Let me know if you want me to try specific solvers. alamo-sweep-validation.csv alamo-sweep.csv

Robbybp / surrogate-vs-implicit

Refactor ALAMO flowsheet to re-use more work from the full-space #13

Multiple-NLP implementation