Clarifications/improvements for FOCAPD paper

Robbybp commented 9 months ago

Based on Sungho's comments.

Was ALAMO objective worse outside of training bounds?
What happened when the various methods failed to converge?
What is the full set of model equations. We should be able to produce this with the new Pyomo Latex writer.

@sbugosen do you remember any other comments that Sungho made that we could potentially address to improve the paper?

Robbybp commented 9 months ago

I'm not saying we need to address all of these for FOCAPD, but they're good to think about for future work.

sbugosen commented 9 months ago

Was ALAMO objective worse outside of training bounds?

-No, the maximum error obtained in the objective function was around 0.87%. The results were accurate even outside training bounds.

What happened when the various methods failed to converge?

In the case of the Neural Network, the surrogate couldn't predict accurately outside of training bounds.
In the case of the Full Space, the formulation was too sensitive to initialization. Our starting point had a high primal and dual infeasibility, and this model couldn't overcome this obstacle. When finding the constraint residuals, the highest ones corresponded to the energy balances in the reformer recuperator and the autothermal reformer, while some instances had the largest residuals in the natural gas expander.
In the case of the Implicit Function, your formulation was robust enough to overcome high primal and dual infeasibilities at the start. This formulation is less sensitive to initialization. The failed instances corresponded to high conversion values. The explanation to this was the evaluation error in the log of the entropy equation. When using the latest version of Cyipopt, we don't get evaluation errors anymore, but we get max. iterations reached. I noticed this last month, but I haven't explored yet the explanation for this. I will look into it, definitely before going to Sandia. su What is the full set of model equations. We should be able to produce this with the new Pyomo Latex writer.

I wasn't aware of this new functionality. I will also look into it.

@sbugosen do you remember any other comments that Sungho made that we could potentially address to improve the paper?

The comment regarding how we can compare the two surrogates. I had this in mind during the summer/fall 2023, that's why I have a table of R2 values comparing both surrogates. For a conference paper I believe this is enough, but for a full paper, I would have done something more rigorous. Maybe a statistical test to see if there is a significant difference between the results of the two surrogates. Alternatively (or additionally), I would have identified the range of each input that is being passed to the surrogates in every instance, and I would have made sure (by tweaking hyperparams in both surrogates) that in that range the surrogates are giving (almost) the same prediction. For example, instead of testing accuracy for the entire range of input temperatures (600-900 K), I would have identified that the surrogate only takes an input range of 720 - 730 K. Then, I would make sure that the surrogates give the same prediction in that easier, more manageable range, while also maintaining a reasonable accuracy accross the entire range of 600 - 900 K.

Robbybp commented 9 months ago

In the case of the Neural Network, the surrogate couldn't predict accurately outside of training bounds

I'm not sure this is a reason for non-convergence. If the surrogate can't predict accurately, we expect worse results when validating with the full-space model, but we still expect the solver to converge.

In the case of the Full Space, the formulation was too sensitive to initialization. Our starting point had a high primal and dual infeasibility, and this model couldn't overcome this obstacle.

The other models presumably start with approximately the same primal and dual infeasibility, though.

When finding the constraint residuals, the highest ones corresponded to the energy balances in the reformer recuperator and the autothermal reformer, while some instances had the largest residuals in the natural gas expander.

This is good information to have, and is about as good as we can hope for as a "reason for non-convergence" at this point.

The failed instances corresponded to high conversion values. The explanation to this was the evaluation error in the log of the entropy equation.

This is also a good explanation. It would be nice, at some point, to classify the failures in the implicit function method and see if they all have this evaluation error (we can do this later).

For a conference paper I believe this is enough, but for a full paper, I would have done something more rigorous. Maybe a statistical test to see if there is a significant difference between the results of the two surrogates.

I think a simple comparison between the prediction accuracy of both surrogates is sufficient.

Alternatively (or additionally), I would have identified the range of each input that is being passed to the surrogates in every instance

You mean you would check the range of inputs that actually get used in the surrogate models during the optimization? I think this is good and useful information to have, but I don't think we need to have stricter accuracy requirements for the range of inputs that actually get used.

sbugosen commented 9 months ago

I'm not sure this is a reason for non-convergence. If the surrogate can't predict accurately, we expect worse results when validating with the full-space model, but we still expect the solver to converge.

I would argue that we don't expect the NN-flowsheet to converge if the neural network is calculating for us values that don't make sense. Just as an example, for an input value of X=0.97, the neural network may be calculating a reactor outlet molar flow and outlet temperature that doesn't obey the mass and energy balance equations of the entire flowsheet, thus rendering the entire problem infeasible. I would have to see what values the NN calculates in those failed instances and compare them to the values calculated by ALAMO. Do you agree?

The other models presumably start with approximately the same primal and dual infeasibility, though.

Yes, every model starts with a very high primal and dual infeasibility (and I believe the ALAMO formulation starts even with higher infeasibilities than full space). But the implicit function, and ALAMO, were able to overcome this more easily than the other formulations. This is expected. In your CCE paper you explain the reason for why the implicit function improves convergence reliability, and I think the reason for ALAMO being so good is because it exploits sparsity.

Also, as seen in the standard deviation of solve time, the implicit function and ALAMO formulations were almost "immune" to the initialization values. For each of the successful instances, the solve time is almost the same. For full space, different parameter combinations caused very different solve times.

You mean you would check the range of inputs that actually get used in the surrogate models during the optimization? I think this is good and useful information to have, but I don't think we need to have stricter accuracy requirements for the range of inputs that actually get used.

Exactly, and make sure that in that specific range (which would be small), both surrogates have almost the same accuracy. Achieving this is more manageable than trying to make the surrogates have the same accuracy for the entire training range.

Robbybp commented 4 months ago

Questions/comments from FOCAPD

People seem to be interested in the full-space/reduced-space comparison. This would be useful to include.
Some people thought that reduced-space should be more reliable. Using a "feasibility argument", I'm inclined to agree. However, using a convexity argument, it is much less clear that reduced-space should be "better behaved".
Some people brought up scaling of the NN surrogate. I know we scale for training, but is the surrogate we include in the Pyomo model in the scaled space?
WaterTAP folks are doing something very similar — using ExternalGreyBoxModel to embed a Gibbs reactor simulation (using a specialized solver) into optimization problems, but using an approximation for the Hessian. They say this converges very reliably, which makes me think that, if we fixed the "bug" in our evaluation error handling, we could improve the implicit function's reliability even more.

Robbybp / surrogate-vs-implicit

Clarifications/improvements for FOCAPD paper #15

Questions/comments from FOCAPD