SimonEnsemble / PorousMaterials.jl

Julia package towards classical molecular modeling of nanoporous materials
GNU General Public License v3.0
51 stars 11 forks source link

Arni/extract henry coeff #100

Closed Surluson closed 5 years ago

Surluson commented 5 years ago

Added two functions (along with docstrings and tests): extract_henry_coefficient: Grabs a DataFrame, names of the Pressure column and Adsorption column, and the number of points it uses to extract the Henry coefficient. I used MultivariateStats for a linear least squared regression (llsq). One issue I found was that the llsq function doesn't like the columns in the DataFrames because they're usually a Union between Floats and Missings.

fit_langmuir: Here I linearize N = (MKP)/(1 + KP) and solve it with llsq as well. I check to see if the first pressure point is 0, because after I linearize the I divide by the pressure (which blows up if P=0)

SimonEnsemble commented 5 years ago

very nice and clean code!

Surluson commented 5 years ago

I added a RMSE error between the points that were fitted to the slope for now. Error bounds for the slope would require us to assume some error for the datapoints no?

SimonEnsemble commented 5 years ago

The RMSE is smallest if you include two points, so based on RMSE, you would always choose two points to estimate the Henry coefficient. There is a tradeoff here though: include more points might mean more confidence in the slope estimate, but also might be less confidence if you are outside of the Henry regime. So how to systematically choose the number of points to include in a Henry coefficient calculation?

The error estimate in the slope is based on Gaussian distributed noise, but the variance of that Gaussian is inferred from the data. See here: https://www.chem.utoronto.ca/coursenotes/analsci/stats/ErrRegr.html I'm not 100% sure the estimate of error in slope is the way to go, but certainly RMSE is not suitable on its own since that always leads to the conclusion to use only two points.

Code looks good except in practice you need a more automatic guess for K and M; I suspect rarely will it converge to a global min with your default guess for K and M; it will get stuck in local min often. In pyIAST, I use the last data pt times 1.1 as the saturation loading, and use the first data points to get a Henry coefficient, giving K = M * KH. But you already thought of a better way to get a good starting point! you can keep your linearized function for Langmuir fitting. Use those as starting params for the nonlinear fitting routine! :+1: Maybe it would be beautiful to keep one function with method=:nonlinear default option passed, which calls method=:linear for starting params.

SimonEnsemble commented 5 years ago

@Surluson this might be stale, could you re-make a pull request now that we have Travis working?

Also it looks like the Langmuir guess is poor starting point; we can do better than having a default guess of M=1.0; this probably will only rarely converge. If you guess M as say 1.1 * maximum(df[:P]) then it will be more robust. Then you can estimate Henry coefficient from the first point, then get the Langmuir K from that as a better estimate. (this is simpler, two lines of code, than the linear fitting I suggested above)

SimonEnsemble commented 5 years ago

@Surluson checks failed.

Surluson commented 5 years ago

It seems all tests failed because MultivariateStats wasn't in the REQUIRE file. And I really like that idea, I'll make the changes asap

SimonEnsemble commented 5 years ago

@Surluson to get this PR merged how about we simplify it and

Surluson commented 5 years ago

I've made some edits to the code. As for the first point: This is a value I used after calculating the slope from the first 3 points. At the time, the code was also trying to find the perfect number of data points to use for the fitting, so this was also used to make sure exactly 3 data points were used for the fitting procedure.

Surluson commented 5 years ago

Yeah sorry, those shouldn't be there. They've been removed

SimonEnsemble commented 5 years ago

please review my changes and approve, then we can merge the PR; it looks good to me!

Surluson commented 5 years ago

Everything looks good to me :+1: You'll have to approve the changes though, because I made the pull request.