AU-BCE-EE / tric-fil-mod

A trickling filter model to simulate air treatment, written in Python
GNU General Public License v3.0
0 stars 0 forks source link

Automatic optimization #27

Open AnneMortensen opened 2 months ago

AnneMortensen commented 2 months ago

Attempt to make a piece of code that can adjust multiple paramters for the model (eg. rate constant k and k2 and a mass transfer parameter) to fit a set of experimental data. If possible fit to multiple data sets as well.

AnneMortensen commented 2 months ago

I have made an attempt to do an automatic optimization using curve_fit. The script is the one called optimization_test, in Applications/Lab experiments/New structure/Tasks/1.0 pH6 Kga. The challenges are:

1) one of the parameters (Kga) does not change much from the initial guess, no matter what the initial guess is.

2) The function does not handle sharp points well, and we have a point when H2S is turned off. Right now, we are only using the first 400 data points, corresponding to the 300s H2S is on. This would also be a problem if we wanted to optimize multiple experiments (eg at different pH) to the same k and Kga and k2 values. At least if that was done by concatenate, meaning that the data would be placed in the same dataframe after one another.

Can we come up with solutions for these challenges? Untill further notice, I will manually fit using loops for Kga and k and visual judgements, as Sasha sugessted.

AnneMortensen commented 2 months ago

Moved this file to Applications/New structure/Tasks/Automatic optimization

sashahafner commented 2 months ago

@AnneMortensen I am working through

https://github.com/AU-BCE-EE/tric-fil-mod/blob/main/applications/Lab%20experiments/New%20structure/Tasks/2.0%20Automatic%20optimization/optimization_test.py

and can run it to L 81 where there is a problem with this file path (even after correction the '/' together, '//'):

ex1 = pd.read_csv('../../Processed_data/experiment_'+first+'.'+second+'.inlet1.csv', sep = ',')
sashahafner commented 2 months ago

Are you able to run that line?

AnneMortensen commented 2 months ago

Yes, I have just retried it, and it works here, both with / and // Is there anything in the "Processed data" folder?

sashahafner commented 2 months ago

Oops, my mistake. Got it now.

sashahafner commented 2 months ago

@AnneMortensen is this a note?

https://github.com/AU-BCE-EE/tric-fil-mod/blob/f36b5bb2248e94c5410af622def4c86971c731f5/applications/Lab%20experiments/New%20structure/Tasks/2.0%20Automatic%20optimization/optimization_test.py#L185

AnneMortensen commented 2 months ago

It is there to "redefine" or copy the parameters to the "optimization" definition. I found that "optimization" had a hard time recognizing what eg. L was, if it was not written like that. Not entirely sure as to why it is necessary, but for me, it would not run without it.

AnneMortensen commented 2 months ago

I should probably write a comment explaining that. Will do that right away.

sashahafner commented 2 months ago

Oh, it must import them into the function environment.

sashahafner commented 2 months ago

Well that was a lot harder than I had expected! But this seems to work:

from scipy.optimize import least_squares

# Optimization function________________________________________________________________
#avoid the sharp point when the H2S is turned off

times = t2norm [cycle2:cycle2+400]*3600
c_meas = c_out [0:400]

def res_calc(x):
    L, por_g, por_l, v_g, v_l, nc, cg0, cl0, cgin, clin,henry, pKa, pH , temp, dens_l, v_res, times, c_meas # import parameters for the model into the optimization definition

    Kga = x[0]
    k = x[1]

    pred_opt= tfmod(L = L, por_g = por_g, por_l = por_l, v_g = v_g, v_l = v_l, nc = nc, cg0 = cg0, 
                  cl0 = cl0, cgin = cgin, clin = clin, Kga = Kga, k = k, k2 = 0.01, henry = henry, pKa = pKa, 
                  pH = (pH1 +pH2)/2, temp = temp, dens_l = dens_l, times = times, v_res = v_res, recirc = True, counter = True)
    c = pred_opt['gas_conc'][nc - 1,:]
    r = c - c_meas

    return r

sol = least_squares(res_calc, [0.05, 0.1], xtol = 3e-16, ftol = 3e-16, gtol = 3e-16, diff_step = 10)

copt = sol['x']

# Models for plotting__________________________________________________________________________

With the data loaded by that script I get these results (x at bottom):

...
     message: '`xtol` termination condition is satisfied.'
        nfev: 49
        njev: 17
  optimality: 0.09044313712723084
      status: 3
     success: True
           x: array([0.00799937, 0.08416681])

At lest they are different from the starting values :).

And note that the model was called 49 times--more like what I would expect, compared to 3 or 5 that I saw with the earlier code.

You can try some variations, but the key seemed to be diff_step. I suspect the solver was taking small steps and could not see a difference in fit compared to noise from the numeric solver used for the model.

Also note that I changed some variable names, imported more variables into user-defined function, and also that that function needs to return the residuals now (I called them r).

I also changed the function name to something that seemed to fit its new role, but use whatever name you prefer.

What you had before looked good to me (curve_fit), and might also work if you change diff_step. It just calls least_squares by default. But the first time argument eventually didn't make much sense to me, and although the help file isn't very clear, I am not sure that function is meant for this type of problem. Actually I don't think least_squares is either, but we can make it work.

sashahafner commented 2 months ago

See if this works for you @AnneMortensen

AnneMortensen commented 2 months ago

Okay, I think the assumption, that there are many combinations that satisfy our demands for a fit is correct.
Dont know hav to make the fancy links, so take a look at Applications/Lab experiments/New structure/Tasks/2.0 Automatic optimization/Plots/Inputs. There, the tables are called IG[insert initial guess], and k_fit and Kga_fit is the results from x. Or simply look here: image

AnneMortensen commented 2 months ago

I see two different ranges that is accepted, k=0.1, Kga=0.008 and k=0.009, Kga=0.04, but I dont see a pattern in what initial guess results in what solution from this, and there is still some variation within each range, and I have only tried 4 initial guesses in this, recieving 4 different answers. Worth trying more initial guesses tomorrow, or do you have a way of getting around these variations in the answers? Have also tried to lower the xtol,ftol and gtol, but apparently there is some limit that at least one of them needs to be above.

AnneMortensen commented 2 months ago

The fact that there are two "answer ranges" is very promising for the fitting to the rest of the data. Will work more on this tomorrow.

AnneMortensen commented 2 months ago

So update from todays work (task 2.1 in tasks folder): I have made it fit to kl rather than Kga, and made it possible to fit to two datasets with different pH values at the same time. For small pH changes, it works nicely, but for the larger ones it has a hard time finding solutions. So by now I think we have narrowed down the options too much, making it very hard to come up with an initial guess that gives a result rather than runtime error (im assuming that the model is not converging correctly, see screenshot).
It is possible to change the initial guess, as well as the experiment number from the top of the script, and each time the initial guess is changed, it saves it as a new figure, thus it should be "easy" to change the initial guess, but it takes some time to run the script.

AnneMortensen commented 2 months ago

My log: image Error message when initial guess is "wrong" image image

AnneMortensen commented 2 months ago

Any ideas or comments, @sashahafner?

sashahafner commented 2 months ago

Any ideas or comments, @sashahafner?

Yes! First, nice progress! I think the failure problem is because the numeric tfmod is failing with negative or otherwise impossible parameters. Two possible solutions:

  1. Use log transformation (more in next comment)
  2. Set bounds
sashahafner commented 2 months ago

I would tend to prefer 1, but I really don't know which is better. For 1:

def res_calc(x):
    L, por_g, por_l, v_g, v_l, nc, cg0, cl0, cgin, clin,henry, pKa, pH , temp, dens_l, v_res, times, c_meas # import parameters for the model into the optimization definition

    Kga = 10**x[0]
    k = 10**x[1]

    pred_opt= tfmod(L = L, por_g = por_g, por_l = por_l, v_g = v_g, v_l = v_l, nc = nc, cg0 = cg0, 
                  cl0 = cl0, cgin = cgin, clin = clin, Kga = Kga, k = k, k2 = 0.01, henry = henry, pKa = pKa, 
                  pH = (pH1 +pH2)/2, temp = temp, dens_l = dens_l, times = times, v_res = v_res, recirc = True, counter = True)
    c = pred_opt['gas_conc'][nc - 1,:]
    r = c - c_meas

    return r

sol = least_squares(res_calc, [np.log10(0.05), np.log10(0.1)], xtol = 3e-16, ftol = 3e-16, gtol = 3e-16, diff_step = 10)
sashahafner commented 2 months ago

Are you using curve_fit or least_squares?

sashahafner commented 2 months ago

These are the differences to go with log transformation. This prevents any negative MTC or rate constant:

Kga = 10**x[0]
    k = 10**x[1]

...

least_squares(... [np.log10(0.05), np.log10(0.1)],
AnneMortensen commented 2 months ago

That makes sense. Nice! I will try the log transformation next :-) And I use least_squares.

AnneMortensen commented 2 months ago

OK, so log transformation did not work as expected, even initial guesses that I knew worked, failed then. But I used the principle, and took the absolute values of kl, k and k2, and that works. Now we just get some very different solutions. Residuals would be nice, but for now, just look at the three sets of graphs here: Initial guess [0.1,0.1,0.1] Experiment 5 4optimized  kl, k, k2 ,  IG   0 1,0 1,0 1 Experiment 5 7optimized  kl, k, k2 ,  IG   0 1,0 1,0 1

Initial guess [2e-5,0.01,1e-5]

Experiment 5 4optimized  kl, k, k2 ,  IG   2e-05,0 01,1e-05

Experiment 5 7optimized  kl, k, k2 ,  IG   2e-05,0 01,1e-05

Initial guess [2e-6,0,0] Experiment 5 4optimized  kl, k, k2 ,  IG   2e-06,0,0 Experiment 5 7optimized  kl, k, k2 ,  IG   2e-06,0,0

The only difference is the initial guess. Plots made by this script: https://github.com/AU-BCE-EE/tric-fil-mod/blob/6475cc88366e1a4a608e8f0a0394907bc8174019/applications/Lab%20experiments/New%20structure/Tasks/2.1%20Automatic%20optimization%20for%20multiple%20data%20sets/optimization_test%20-%20version3.py

Changes:

#Define wat is in each position in the answer array x

kl = abs(x[0])
k = abs(x[1])
k2 = abs(x[2])

...

#Finding the solution that gives the least combined deviation between model and experiments
sol = least_squares(res_calc, [IG_kl, IG_k, IG_k2], xtol = 3e-16, ftol = 3e-16, gtol = 3e-16, diff_step = diff_step)

copt = sol['x']

kl_fit = abs(float(copt[0]))
k_fit = abs(float(copt[1]))
k2_fit = abs(float(copt[2]))
sashahafner commented 2 months ago

Nice work. Now I am looking at the right script, and I see how you included 2 experiments.

sashahafner commented 2 months ago

So in the last plot you showed above, is that a good fit or does the model actually match the inlet concentration? I am not sure from the legend.

sashahafner commented 2 months ago

I'm looking into log transformation now. Takes a while to run now doesn't it? But you might check out the bounds argument instead: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html

Also, note that the cost element from least_squares output gives us overall fit and so some indication of whether a solution makes sense (if we compare it between different solutions) even before looking at a plot.

I am sure our xtol and other argument values are not optimal. Sorry I can't provide much help on this--I guess the solution is trial-and-error along with whatever info comes from the help file.

sashahafner commented 2 months ago

@AnneMortensen I am going to push a version of that script that uses log transformation. It seems to work well, but if I am overlooking something you can revert the commit.

sashahafner commented 2 months ago

With e51fb3be5c618cfdfb11b8bcafa05fea65442e01 I get:

        cost: 0.02664191276669185
...
         x: array([-5.67381683, -1.98488671, -2.70398647])

I get very similar values with different initial guesses. But I don't have plots for evaluation yet (some stupid Python error). And I see kl is very low--likely bogus, oops.

AnneMortensen commented 2 months ago

So in the last plot you showed above, is that a good fit or does the model actually match the inlet concentration? I am not sure from the legend.

It fits the inlet. Changing the legends has been added to me "To do list" :-)

Will check out "Bounds" monday.

AnneMortensen commented 2 months ago

I can get your plots (they are saved in the "Plots" folder, not directly where the script is saved) Will push a version that saves them like you did in "demos". And as you thought, they are equal to 0 at all times Experiment 5 4optimized  kl, k, k2 ,  IG   0 001,0 001,0 001 Experiment 5 7optimized  kl, k, k2 ,  IG   0 001,0 001,0 001

sashahafner commented 2 months ago

Thanks @AnneMortensen . That's terrible optimization! Something is really strange when we get best estimates that perform worse than the initial guesses. Hmm. I'll try to dig into the output from the residuals function tomorrow.

AnneMortensen commented 2 months ago

So far today, I have tried the following: Setting the bounds to between 0 and 1, multiplying all concentrations by 10 000 to make differences bigger, and removing all noise by taking the moving average of 50 points. I dont see much improvement unfortunately. So I dont see a point in pushing the changes yet. Do you agree? Plots: IG: [0.0001,0.01,0.8] Experiment 5 4optimized, no noise, 10000,  IG   0 0001,0 01,0 8 Experiment 5 7optimized, no noise, 10000  IG   0 0001,0 01,0 8

IG: [0.0001, 0.00001,0.0008] Experiment 5 4optimized, no noise, 10000,  IG   0 0001,1e-05,0 0008

Experiment 5 7optimized, no noise, 10000  IG   0 0001,1e-05,0 0008 IG: [0.01, 0.00001,0.0008] Experiment 5 4optimized, no noise, 10000,  IG   0 01,1e-05,0 0008

Experiment 5 7optimized, no noise, 10000  IG   0 01,1e-05,0 0008 Legends are still not changed, so for clarity: Blue is measured outlet, orange is measured inlet, black is the optimized model and green is onda and k=k2=0

AnneMortensen commented 2 months ago

Ended pushing it in a separate folder, in case you want to see what I have done. https://github.com/AU-BCE-EE/tric-fil-mod/tree/7939666ba693c59ec0399528379298f6d28800cc/applications/Lab%20experiments/New%20structure/Tasks/2.1%20Automatic%20optimization%20for%20multiple%20data%20sets/Failed%20Attempts%2029.4.24

AnneMortensen commented 2 months ago

Version of the log transformation that plots the initial guesses instead of the onda model for comparison https://github.com/AU-BCE-EE/tric-fil-mod/blob/172a479bd731266ad0146cf6c7be0c74798c71a5/applications/Lab%20experiments/New%20structure/Tasks/2.1%20Automatic%20optimization%20for%20multiple%20data%20sets/optimization_test%20-%20version3.py