Open AnneMortensen opened 7 months ago
I have made an attempt to do an automatic optimization using curve_fit. The script is the one called optimization_test, in Applications/Lab experiments/New structure/Tasks/1.0 pH6 Kga. The challenges are:
1) one of the parameters (Kga) does not change much from the initial guess, no matter what the initial guess is.
2) The function does not handle sharp points well, and we have a point when H2S is turned off. Right now, we are only using the first 400 data points, corresponding to the 300s H2S is on. This would also be a problem if we wanted to optimize multiple experiments (eg at different pH) to the same k and Kga and k2 values. At least if that was done by concatenate, meaning that the data would be placed in the same dataframe after one another.
Can we come up with solutions for these challenges? Untill further notice, I will manually fit using loops for Kga and k and visual judgements, as Sasha sugessted.
Moved this file to Applications/New structure/Tasks/Automatic optimization
@AnneMortensen I am working through
and can run it to L 81 where there is a problem with this file path (even after correction the '/' together, '//'):
ex1 = pd.read_csv('../../Processed_data/experiment_'+first+'.'+second+'.inlet1.csv', sep = ',')
Are you able to run that line?
Yes, I have just retried it, and it works here, both with / and // Is there anything in the "Processed data" folder?
Oops, my mistake. Got it now.
It is there to "redefine" or copy the parameters to the "optimization" definition. I found that "optimization" had a hard time recognizing what eg. L was, if it was not written like that. Not entirely sure as to why it is necessary, but for me, it would not run without it.
I should probably write a comment explaining that. Will do that right away.
Oh, it must import them into the function environment.
Well that was a lot harder than I had expected! But this seems to work:
from scipy.optimize import least_squares
# Optimization function________________________________________________________________
#avoid the sharp point when the H2S is turned off
times = t2norm [cycle2:cycle2+400]*3600
c_meas = c_out [0:400]
def res_calc(x):
L, por_g, por_l, v_g, v_l, nc, cg0, cl0, cgin, clin,henry, pKa, pH , temp, dens_l, v_res, times, c_meas # import parameters for the model into the optimization definition
Kga = x[0]
k = x[1]
pred_opt= tfmod(L = L, por_g = por_g, por_l = por_l, v_g = v_g, v_l = v_l, nc = nc, cg0 = cg0,
cl0 = cl0, cgin = cgin, clin = clin, Kga = Kga, k = k, k2 = 0.01, henry = henry, pKa = pKa,
pH = (pH1 +pH2)/2, temp = temp, dens_l = dens_l, times = times, v_res = v_res, recirc = True, counter = True)
c = pred_opt['gas_conc'][nc - 1,:]
r = c - c_meas
return r
sol = least_squares(res_calc, [0.05, 0.1], xtol = 3e-16, ftol = 3e-16, gtol = 3e-16, diff_step = 10)
copt = sol['x']
# Models for plotting__________________________________________________________________________
With the data loaded by that script I get these results (x
at bottom):
...
message: '`xtol` termination condition is satisfied.'
nfev: 49
njev: 17
optimality: 0.09044313712723084
status: 3
success: True
x: array([0.00799937, 0.08416681])
At lest they are different from the starting values :).
And note that the model was called 49 times--more like what I would expect, compared to 3 or 5 that I saw with the earlier code.
You can try some variations, but the key seemed to be diff_step
.
I suspect the solver was taking small steps and could not see a difference in fit compared to noise from the numeric solver used for the model.
Also note that I changed some variable names, imported more variables into user-defined function, and also that that function needs to return the residuals now (I called them r
).
I also changed the function name to something that seemed to fit its new role, but use whatever name you prefer.
What you had before looked good to me (curve_fit
), and might also work if you change diff_step
.
It just calls least_squares
by default.
But the first time argument eventually didn't make much sense to me, and although the help file isn't very clear, I am not sure that function is meant for this type of problem.
Actually I don't think least_squares
is either, but we can make it work.
See if this works for you @AnneMortensen
Okay, I think the assumption, that there are many combinations that satisfy our demands for a fit is correct.
Dont know hav to make the fancy links, so take a look at Applications/Lab experiments/New structure/Tasks/2.0 Automatic optimization/Plots/Inputs. There, the tables are called IG[insert initial guess], and k_fit and Kga_fit is the results from x.
Or simply look here:
I see two different ranges that is accepted, k=0.1, Kga=0.008 and k=0.009, Kga=0.04, but I dont see a pattern in what initial guess results in what solution from this, and there is still some variation within each range, and I have only tried 4 initial guesses in this, recieving 4 different answers. Worth trying more initial guesses tomorrow, or do you have a way of getting around these variations in the answers? Have also tried to lower the xtol,ftol and gtol, but apparently there is some limit that at least one of them needs to be above.
The fact that there are two "answer ranges" is very promising for the fitting to the rest of the data. Will work more on this tomorrow.
So update from todays work (task 2.1 in tasks folder): I have made it fit to kl rather than Kga, and made it possible to fit to two datasets with different pH values at the same time. For small pH changes, it works nicely, but for the larger ones it has a hard time finding solutions. So by now I think we have narrowed down the options too much, making it very hard to come up with an initial guess that gives a result rather than runtime error (im assuming that the model is not converging correctly, see screenshot).
It is possible to change the initial guess, as well as the experiment number from the top of the script, and each time the initial guess is changed, it saves it as a new figure, thus it should be "easy" to change the initial guess, but it takes some time to run the script.
My log: Error message when initial guess is "wrong"
Any ideas or comments, @sashahafner?
Any ideas or comments, @sashahafner?
Yes! First, nice progress! I think the failure problem is because the numeric tfmod
is failing with negative or otherwise impossible parameters. Two possible solutions:
I would tend to prefer 1, but I really don't know which is better. For 1:
def res_calc(x):
L, por_g, por_l, v_g, v_l, nc, cg0, cl0, cgin, clin,henry, pKa, pH , temp, dens_l, v_res, times, c_meas # import parameters for the model into the optimization definition
Kga = 10**x[0]
k = 10**x[1]
pred_opt= tfmod(L = L, por_g = por_g, por_l = por_l, v_g = v_g, v_l = v_l, nc = nc, cg0 = cg0,
cl0 = cl0, cgin = cgin, clin = clin, Kga = Kga, k = k, k2 = 0.01, henry = henry, pKa = pKa,
pH = (pH1 +pH2)/2, temp = temp, dens_l = dens_l, times = times, v_res = v_res, recirc = True, counter = True)
c = pred_opt['gas_conc'][nc - 1,:]
r = c - c_meas
return r
sol = least_squares(res_calc, [np.log10(0.05), np.log10(0.1)], xtol = 3e-16, ftol = 3e-16, gtol = 3e-16, diff_step = 10)
Are you using curve_fit
or least_squares
?
These are the differences to go with log transformation. This prevents any negative MTC or rate constant:
Kga = 10**x[0]
k = 10**x[1]
...
least_squares(... [np.log10(0.05), np.log10(0.1)],
That makes sense. Nice! I will try the log transformation next :-) And I use least_squares.
OK, so log transformation did not work as expected, even initial guesses that I knew worked, failed then. But I used the principle, and took the absolute values of kl, k and k2, and that works. Now we just get some very different solutions. Residuals would be nice, but for now, just look at the three sets of graphs here: Initial guess [0.1,0.1,0.1]
Initial guess [2e-5,0.01,1e-5]
Initial guess [2e-6,0,0]
The only difference is the initial guess. Plots made by this script: https://github.com/AU-BCE-EE/tric-fil-mod/blob/6475cc88366e1a4a608e8f0a0394907bc8174019/applications/Lab%20experiments/New%20structure/Tasks/2.1%20Automatic%20optimization%20for%20multiple%20data%20sets/optimization_test%20-%20version3.py
Changes:
#Define wat is in each position in the answer array x
kl = abs(x[0])
k = abs(x[1])
k2 = abs(x[2])
...
#Finding the solution that gives the least combined deviation between model and experiments
sol = least_squares(res_calc, [IG_kl, IG_k, IG_k2], xtol = 3e-16, ftol = 3e-16, gtol = 3e-16, diff_step = diff_step)
copt = sol['x']
kl_fit = abs(float(copt[0]))
k_fit = abs(float(copt[1]))
k2_fit = abs(float(copt[2]))
Nice work. Now I am looking at the right script, and I see how you included 2 experiments.
So in the last plot you showed above, is that a good fit or does the model actually match the inlet concentration? I am not sure from the legend.
I'm looking into log transformation now.
Takes a while to run now doesn't it?
But you might check out the bounds
argument instead: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html
Also, note that the cost
element from least_squares
output gives us overall fit and so some indication of whether a solution makes sense (if we compare it between different solutions) even before looking at a plot.
I am sure our xtol
and other argument values are not optimal.
Sorry I can't provide much help on this--I guess the solution is trial-and-error along with whatever info comes from the help file.
@AnneMortensen I am going to push a version of that script that uses log transformation. It seems to work well, but if I am overlooking something you can revert the commit.
With e51fb3be5c618cfdfb11b8bcafa05fea65442e01 I get:
cost: 0.02664191276669185
...
x: array([-5.67381683, -1.98488671, -2.70398647])
I get very similar values with different initial guesses. But I don't have plots for evaluation yet (some stupid Python error). And I see kl is very low--likely bogus, oops.
So in the last plot you showed above, is that a good fit or does the model actually match the inlet concentration? I am not sure from the legend.
It fits the inlet. Changing the legends has been added to me "To do list" :-)
Will check out "Bounds" monday.
I can get your plots (they are saved in the "Plots" folder, not directly where the script is saved) Will push a version that saves them like you did in "demos". And as you thought, they are equal to 0 at all times
Thanks @AnneMortensen . That's terrible optimization! Something is really strange when we get best estimates that perform worse than the initial guesses. Hmm. I'll try to dig into the output from the residuals function tomorrow.
So far today, I have tried the following: Setting the bounds to between 0 and 1, multiplying all concentrations by 10 000 to make differences bigger, and removing all noise by taking the moving average of 50 points. I dont see much improvement unfortunately. So I dont see a point in pushing the changes yet. Do you agree? Plots: IG: [0.0001,0.01,0.8]
IG: [0.0001, 0.00001,0.0008]
IG: [0.01, 0.00001,0.0008]
Legends are still not changed, so for clarity: Blue is measured outlet, orange is measured inlet, black is the optimized model and green is onda and k=k2=0
Ended pushing it in a separate folder, in case you want to see what I have done. https://github.com/AU-BCE-EE/tric-fil-mod/tree/7939666ba693c59ec0399528379298f6d28800cc/applications/Lab%20experiments/New%20structure/Tasks/2.1%20Automatic%20optimization%20for%20multiple%20data%20sets/Failed%20Attempts%2029.4.24
Version of the log transformation that plots the initial guesses instead of the onda model for comparison https://github.com/AU-BCE-EE/tric-fil-mod/blob/172a479bd731266ad0146cf6c7be0c74798c71a5/applications/Lab%20experiments/New%20structure/Tasks/2.1%20Automatic%20optimization%20for%20multiple%20data%20sets/optimization_test%20-%20version3.py
Attempt to make a piece of code that can adjust multiple paramters for the model (eg. rate constant k and k2 and a mass transfer parameter) to fit a set of experimental data. If possible fit to multiple data sets as well.