Closed Surluson closed 5 years ago
very nice and clean code!
i
pressure points:
[n_i - M * K * p_i / (1 + K * p_i)]^2
which is a non-linear optimization problem. You first linearized the Langmuir isotherm, which will not minimize the above. I haven't thought much about which is better (it is probably context dependent). Do you know the difference between the two methods? I imagine one will over-emphasize high or low pressure data points? This well-cited paper suggests that the linearized version (while easier) is not a good way to go: https://pubag.nal.usda.gov/download/5945/PDF The Optim.jl
package is good for non-linear optimization.I added a RMSE error between the points that were fitted to the slope for now. Error bounds for the slope would require us to assume some error for the datapoints no?
The RMSE is smallest if you include two points, so based on RMSE, you would always choose two points to estimate the Henry coefficient. There is a tradeoff here though: include more points might mean more confidence in the slope estimate, but also might be less confidence if you are outside of the Henry regime. So how to systematically choose the number of points to include in a Henry coefficient calculation?
The error estimate in the slope is based on Gaussian distributed noise, but the variance of that Gaussian is inferred from the data. See here: https://www.chem.utoronto.ca/coursenotes/analsci/stats/ErrRegr.html I'm not 100% sure the estimate of error in slope is the way to go, but certainly RMSE is not suitable on its own since that always leads to the conclusion to use only two points.
Code looks good except in practice you need a more automatic guess for K and M; I suspect rarely will it converge to a global min with your default guess for K and M; it will get stuck in local min often. In pyIAST, I use the last data pt times 1.1 as the saturation loading, and use the first data points to get a Henry coefficient, giving K = M * KH. But you already thought of a better way to get a good starting point! you can keep your linearized function for Langmuir fitting. Use those as starting params for the nonlinear fitting routine! :+1: Maybe it would be beautiful to keep one function with method=:nonlinear
default option passed, which calls method=:linear
for starting params.
@Surluson this might be stale, could you re-make a pull request now that we have Travis working?
Also it looks like the Langmuir guess is poor starting point; we can do better than having a default guess of M=1.0
; this probably will only rarely converge. If you guess M
as say 1.1 * maximum(df[:P])
then it will be more robust. Then you can estimate Henry coefficient from the first point, then get the Langmuir K
from that as a better estimate. (this is simpler, two lines of code, than the linear fitting I suggested above)
@Surluson checks failed.
model
can be henry
or langmuir
, then we hv one function fit_isotherm(df, pressure_col_name, n_col_name, model)
then we hv _guess(df, model)
that guesses the params specific to each model. the guess for the henry coeff can be the slope based on the first data pt. for the langmuir, slope of the first data pt, then 1.1 times max value of adsorbed gas in the data. So instead of using the llsq
function, we use the non-linear fitting routine for generality. the objective function is just different for every isotherm. this will allow us to incorprate all sorts of isotherm models. and it is as easy as adding if method == :mymodel
then adding a guess and writing the objective function and functional form for that model.It seems all tests failed because MultivariateStats
wasn't in the REQUIRE
file.
And I really like that idea, I'll make the changes asap
@Surluson to get this PR merged how about we simplify it and
n_pts
as the number of points to use or, more ambitiously, choose n_pts
automatically for Henry fits)I've made some edits to the code. As for the first point: This is a value I used after calculating the slope from the first 3 points. At the time, the code was also trying to find the perfect number of data points to use for the fitting, so this was also used to make sure exactly 3 data points were used for the fitting procedure.
Yeah sorry, those shouldn't be there. They've been removed
fit_isotherm
--> fit_adsorption_isotherm
_guess
is doing.please review my changes and approve, then we can merge the PR; it looks good to me!
Everything looks good to me :+1: You'll have to approve the changes though, because I made the pull request.
Added two functions (along with docstrings and tests):
extract_henry_coefficient
: Grabs a DataFrame, names of the Pressure column and Adsorption column, and the number of points it uses to extract the Henry coefficient. I used MultivariateStats for a linear least squared regression (llsq
). One issue I found was that thellsq
function doesn't like the columns in the DataFrames because they're usually a Union betweenFloat
s andMissing
s.fit_langmuir
: Here I linearizeN = (MKP)/(1 + KP)
and solve it withllsq
as well. I check to see if the first pressure point is 0, because after I linearize the I divide by the pressure (which blows up ifP=0
)