Robbybp / surrogate-vs-implicit

Comparing surrogate models and implicit function formulations for chemical process models
Other
1 stars 0 forks source link

Correction of surrogate generation scripts #21

Closed sbugosen closed 7 months ago

sbugosen commented 7 months ago

@Robbybp

To run the atr_alamo_surrogate_generation.pyfile:

python atr_alamo_surrogate_generation.py --fname=data_atr.csv --surrogate_fname=alamo_surrogate_atr.json I'll correct the nn_tuning_training.py script next.

sbugosen commented 7 months ago

@Robbybp

To run the nn_tuning_trainigng.py file:

python nn_tuning_training.py --fname=data_atr.csv --surrogate_fname=keras_surrogate_high_rel --tune=False

If tune=False, you train only 1 NN with 4 layers, 30 neurons each, tanh. If tune=True, you run the whole tuning loop.

sbugosen commented 7 months ago

I made a few minor changes, but otherwise this looks great. I'm running the NN tuning right now, then tomorrow I'll validate that the surrogate models I'm generating give me the same convergence results in the optimization problem. Thanks for making these easy to use.

Just realized I made a mistake. I apologize. The neural network we are using has 32 neurons in each layer. Not 30. So the sweep should be n_nodes_per_layer_values = np.arange(20,33,1).tolist(). And the argument on the function should be neurons = 32.

Robbybp commented 7 months ago

The neural network we are using has 32 neurons in each layer. Not 30. So the sweep should be n_nodes_per_layer_values = np.arange(20,33,1).tolist(). And the argument on the function should be neurons = 32.

Can you push these changes?

sbugosen commented 7 months ago

Yes, right now.

Robbybp commented 7 months ago

How do these surrogates look when you use them in the optimization parameter sweep? For me, the ALAMO surrogate I get with fresh data generation and surrogate training is less reliable than our previous surrogate. I get the following convergence pattern:

drawing
sbugosen commented 7 months ago

Can you push the new alamo .json file just to see how it looks like?

Also, do you get the same alamo surrogate when you train the data that we already had?

Robbybp commented 7 months ago

Here is the convergence pattern I get with the NN surrogate, trained on the same data:

and here is the ALAMO surrogate: alamo_surrogate_atr.json. Just looking at the git diff, this seems to be a significantly different surrogate than the one we produced before (although the validation error of the optimization results still looks good).

sbugosen commented 7 months ago

Here is the convergence pattern I get with the NN surrogate, trained on the same data: drawing

and here is the ALAMO surrogate: alamo_surrogate_atr.json. Just looking at the git diff, this seems to be a significantly different surrogate than the one we produced before (although the validation error of the optimization results still looks good).

Yes, they are extremely different. The most important difference I see is that I generated data in this conversion range: (0.80, 0.95), while the new data generated is in the (0.80, 0.96) range. This is the data that I generated last year, on the range of conversion 0.80 to 0.949 precisely. If you use this data you will get the same surrogate that we've been using. data_atr (3).csv

Robbybp commented 7 months ago

What do you think about sampling on a regular grid in our input space instead of using uniform random samples?

sbugosen commented 7 months ago

What do you think about sampling on a regular grid in our input space instead of using uniform random samples?

I believe doing this will be very time consuming if we want to generate representative data for our system. The issue with sampling on a regular grid is the number of samples that we will need. We have these inputs:

conversion_range = (0.80,0.96) flow_mol_h2o_range = (200,350) flow_mol_gas_range = (600,900) temp_gas_range = (600,900)

If we wanted to do grid sampling, my initial, "intuitive" approach to try to generate representative data would be: -Sample conversion values (0.80, 0.83, 0.86, 0.89, 0.92, 0.96). (6 samples) -Sample steam flow values (200, 210, 220, 230, ....). (16 samples) -Sample CH4 flow values (600, 620, 640, 660, ....). (18 samples) -Sample CH4 temp values (600, 620, 640, 660, ....). (18 samples)

This would mean we would have to solve the simple reactor flowsheet 31,104 times, instead of just 600 times. I believe it's trickier to do grid sampling and to achieve a good balance between obtaining representative data and not spend too much time generating it.

Robbybp commented 7 months ago

I just tried sampling over a regular grid with 5 points per input for a total of 625 possible samples. Of these, 569 converge to give me training data points. When I train ALAMO and NN surrogates, the results are comparable to what we currently see: ALAMO solves all 64 instances, NN solves 48 instances. This seems viable to me, especially given the advantage of deterministic sampling.

sbugosen commented 7 months ago

Interesting, I didn't expect the surrogates to provide good results with such a coarse grid. That further justifies that the Gibbs reactor is not that hard to approximate after all.

sbugosen commented 7 months ago

Out of curiosity, can we also obtain deterministic samples by setting a seed when doing uniform sampling?

Robbybp commented 7 months ago

That further justifies that the Gibbs reactor is not that hard to approximate after all.

My take is more that 600 samples is quite sufficient, although I think this is a reasonable claim given the simplicity of the ALAMO surrogate.

Out of curiosity, can we also obtain deterministic samples by setting a seed when doing uniform sampling?

Yes, but I see no advantage to doing this over using a regular grid (let me know if there is one).

sbugosen commented 7 months ago

That further justifies that the Gibbs reactor is not that hard to approximate after all.

My take is more that 600 samples is quite sufficient, although I think this is a reasonable claim given the simplicity of the ALAMO surrogate.

Out of curiosity, can we also obtain deterministic samples by setting a seed when doing uniform sampling?

Yes, but I see no advantage to doing this over using a regular grid (let me know if there is one).

For our case, I don't see any advantage of using uniform sampling over grid sampling. For more complex unit operations that have more inputs, grid sampling will be too expensive, and in those cases uniform sampling will probably be a better choice.

Also, I imagine uniform sampling is probably better with unit operations that involve phase changes, because grid sampling will have a harder time capturing samples in the associated boundaries, where small changes in inputs cause large changes in the properties (and also in the outputs of the system). If we do work with processes with phase changes, this would be something we would have to ponder a bit more.

Robbybp commented 7 months ago

Just pushed the version of the surrogates I used to produce the most recent paper version. I'm not quite sold on including surrogates in the repo, but it may be useful for reproducibility, as at least some parts of the training process (train-test split and NN training) are non-deterministic. So I'll leave them in for now.

Robbybp commented 7 months ago

FYI, pending any additional comments, I plan to merge this PR and open-source the repo on Monday.