Parameter Estimation for sle_solubility

samuel-zhang01 commented 8 months ago

Amazing devs of Clapeyron.jl,

Sorry to trouble you guys again. I'm exploring parameter estimation for PCSAFT of APIs. Due to the low vapour pressure of APIs, obtaining $p{sat}$ and $\rho{l,sat}$ experimentally is quite challanging. experimental sle_solubility data of APIs in different solutes though should be a good substitute to do parameterisation. Is there a built-in support for param estimation for segment, sigma, epsilon, epsilon_assoc, and bondvol?

# Here is an imaginary param estimation code that uses sle_solubility
# init
using Pkg
Pkg.activate("..")
using Clapeyron, BlackBoxOptim

# Defining a sle_solubility function
function EXPsolubility(model::CompositeModel,T)
    sol = sle_solubility(model,101325.,T,[1.,0.];solute=[solute1])
    return sol[2]
end

# Defining solute and solvent, will be fetched from the database later
solute1="nicotinamide2016"
solvent1="ethanol2016"

# Defining a composite model
model = CompositeModel([solvent1,solute1];fluid=PCSAFT,solid=SolidHfus,gas=PCSAFT,gas_userlocations=["data/sle_specific"],fluid_userlocations=["data/sle_specific"],solid_userlocations=["data/sle_specific"])

# Creating the Dict for estimation
toestimate = [
    Dict(
        :param => :epsilon_assoc,
        :lower => 1000.,
        :upper => 3000.,
        :guess => 2500.
    ),
    Dict(
        :param => :bondvol,
        :lower => 0.02,
        :upper => 0.04,
        :guess => 0.03
    )
];

# Estimation
estimator,objective,initial,upper,lower = Estimation(model,toestimate,["data/EXPsolubility.csv"]);

# BBOptim
nparams = length(initial)
bounds  = [(lower[i],upper[i]) for i in 1:nparams]
bounds
result = BlackBoxOptim.bboptimize(objective; 
        SearchRange = bounds, 
        NumDimensions = nparams,
        MaxSteps=200,
        PopulationSize = 1000,
        TraceMode=:verbose)

params = BlackBoxOptim.best_candidate(result);

here is the error code:

MethodError: no method matching sle_solubility(::CompositeModel{PCSAFT{BasicIdeal, Float64}, PCSAFT{BasicIdeal, Float64}, SolidHfus, PCSAFT{BasicIdeal, Float64}, Nothing}, ::Float64)

Closest candidates are:
  sle_solubility(::CompositeModel, ::Any, !Matched::Any, !Matched::Any; solute)
   @ Clapeyron C:\Users\USER\.julia\packages\Clapeyron\Gx6LP\src\methods\property_solvers\multicomponent\solids\sle_solubility.jl:9

Stacktrace:
  [1] _broadcast_getindex_evalf
    @ .\broadcast.jl:683 [inlined]
  [2] _broadcast_getindex
    @ .\broadcast.jl:656 [inlined]
  [3] getindex
    @ .\broadcast.jl:610 [inlined]
  [4] copy
    @ .\broadcast.jl:912 [inlined]
  [5] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(sle_solubility), Tuple{Base.RefValue{CompositeModel{PCSAFT{BasicIdeal, Float64}, PCSAFT{BasicIdeal, Float64}, SolidHfus, PCSAFT{BasicIdeal, Float64}, Nothing}}, Vector{Union{Missing, Float64}}}})
    @ Base.Broadcast .\broadcast.jl:873
  [6] objective_function(estimation::Estimation{CompositeModel{PCSAFT{BasicIdeal, Float64}, PCSAFT{BasicIdeal, Float64}, SolidHfus, PCSAFT{BasicIdeal, Float64}, Nothing}}, guesses::Vector{Float64})
    @ Clapeyron C:\Users\USER\.julia\packages\Clapeyron\Gx6LP\src\estimation\estimation.jl:333
  [7] objective
    @ C:\Users\USER\.julia\packages\Clapeyron\Gx6LP\src\estimation\estimation.jl:137 [inlined]
  [8] fitness(x::Vector{Float64}, p::FunctionBasedProblem{Clapeyron.var"#objective#1115"{Estimation{CompositeModel{PCSAFT{BasicIdeal, Float64}, PCSAFT{BasicIdeal, Float64}, SolidHfus, PCSAFT{BasicIdeal, Float64}, Nothing}}}, ScalarFitnessScheme{true}, ContinuousRectSearchSpace, Nothing})
    @ BlackBoxOptim C:\Users\USER\.julia\packages\BlackBoxOptim\I3lfp\src\problem.jl:61
  [9] setup_problem(func::Function, parameters::ParamsDictChain)
...
    @ BlackBoxOptim C:\Users\USER\.julia\packages\BlackBoxOptim\I3lfp\src\bboptimize.jl:92
 [13] bboptimize
    @ C:\Users\USER\.julia\packages\BlackBoxOptim\I3lfp\src\bboptimize.jl:91 [inlined]
 [14] top-level scope
    @ c:\Users\USER\Documents\Documents\GitHub Repos\PC-SAFT-Thermo-Properties-Tool\example-calculations\Parameter-Estimation\parameter_estimation test.ipynb:4

I've linked the databases used for my calculation.

Thank you so much for your help!

data.zip

longemen3000 commented 8 months ago

Seems a bug on our part, let me see what i can do

samuel-zhang01 commented 8 months ago

@longemen3000 you are a hero, thanks!

pw0908 commented 8 months ago

Hi Samuel, I took a look at your zip file. It looks like you labelled the method incorrectly in your EXPsolubility.csv file. The method used should be the name of the function you defined above:

Clapeyron Estimator,
[method = EXPsolubility],
T,out_x

This should fix the issue.

pw0908 commented 8 months ago

To address your question regarding estimating the other parameters, you can take a look at the example notebook. Since you are estimating parameters for a mixture, you'll need to specify the indices. Here is how I would code it up:

using Clapeyron, BlackBoxOptim

function EXPsolubility(model::CompositeModel,T)
    sol = sle_solubility(model,101325.,T,[1.,0.];solute=[solute1])
    return sol[2]
end

# Defining solute and solvent, will be fetched from the database later
solute1="nicotinamide2016"
solvent1="ethanol2016"

# Defining a composite model
model = CompositeModel([solvent1,solute1];fluid=PCSAFT,solid=SolidHfus,gas=PCSAFT,gas_userlocations=["data/sle_specific"],fluid_userlocations=["data/sle_specific"],solid_userlocations=["data/sle_specific"])

# Creating the Dict for estimation
toestimate = [
    Dict(
        :param => segment,
        :indices => 2,
        :lower => 1.,
        :upper => 5.,
        :guess => 2.5
    ),
    Dict(
        :param => :sigma,
        :indices => (2,2),
        :recombine => true,
        :lower => 2.8,
        :upper => 3.8,
        :guess => 3.3,
        :factor => 1e-10
    ),
    Dict(
        :param => epsilon,
        :indices => (2,2),
        :recombine => true,
        :lower => 250.,
        :upper => 500.,
        :guess => 350.
    ),
    Dict(
        :param => :epsilon_assoc,
        :indices => 4,
        :recombine => true,
        :lower => 1000.,
        :upper => 3000.,
        :guess => 2500.
    ),
    Dict(
        :param => :bondvol,
        :indices => 4,
        :recombine => true,
        :lower => 0.02,
        :upper => 0.04,
        :guess => 0.03
    )
];

# Estimation
estimator,objective,initial,upper,lower = Estimation(model,toestimate,["data/EXPsolubility.csv"]);

# BBOptim
nparams = length(initial)
bounds  = [(lower[i],upper[i]) for i in 1:nparams]
bounds
result = BlackBoxOptim.bboptimize(objective; 
        SearchRange = bounds, 
        NumDimensions = nparams,
        MaxSteps=200,
        PopulationSize = 1000,
        TraceMode=:verbose)

params = BlackBoxOptim.best_candidate(result);

I will quickly point out that fitting 5 parameters using 5 data points is very ill advised. I would recommend increasing the size of the dataset (wider temperature range or more solvents) or setting some initial values for the parameters and adjust a smaller subset of parameters (typically, people just fit segment, sigma and epsilon, setting epsilon_assoc and bondvol to be some constant).

pw0908 commented 8 months ago

@longemen3000 I realised while doing an initial run of this that, in the case where recombine=true, we should also recombine the cross association parameters. If Samuel were to just estimate the self-association parameters above, it wouldn't update the cross association parameter between the solvent and solute.

longemen3000 commented 8 months ago

Hmmm, that requires a patch then

samuel-zhang01 commented 8 months ago

@pw0908 Thanks for the very swift reply. You are right about the crossaccos values. Following the advice on param estimation, is there a way to set up the BBOptim to run for different solvents without having to rebuild toestimate each time? If not, how do I carry over the param estimation of one solvent to another? Thanks!

pw0908 commented 8 months ago

Hi @samuel-zhang01,

This will be a bit more challenging but should be doable within Clapeyron. The first thing youll need to do is define a model with the API you're trying to fit and the various solvents you intend to include:

model = CompositeModel(["solute","solvent1","solvent2",...]) # inlcude all the messy parameters here.

It is important to make sure the solute is the first component as this will mean that all the solute parameters will be index 1. To assemble your data, you will probably have separate spreadsheets for each solvent. Within Clapeyron, you can define which components are involved in which data set within the spreadsheet headers:

Clapeyron Estimator,,,
[method= EXPsolubility, species=solute solvent1],,,

This means that Clapeyron will automatically reduce your model within the estimation to a binary mixture where the EXPsolubility function you wrote should still work (where the composition is defined as [0.,1.] instead). You can then include all your datasets within the Estimation function.

To keep your life simple, I recommend you don't fit the association parameters (just set them to some reasonable constant). We haven't fixed the cross association problem yet so I don't think it'll be wise to do so either way.

Since I don't have the experimental data for this system, I can't write you a working example. But I do believe this should work.

samuel-zhang01 commented 8 months ago

Thank you so much @pw0908, I will give it a try

samuel-zhang-pfizer commented 8 months ago

Dear Devs of Clapeyron.jl,

I might need some more help regarding parameter estimation and the toestimate() function.

I am attempting to perform parameter estimation on lidocaine with experimental solubility points from acetone, ethanol, n-hexane, and n-heptane from this publication (around 4 temperatures for each solvent). I am able to run this script without errors but I have a few additional questions.

# using Pkg,Clapeyron
# Pkg.activate("..")
using Clapeyron, BlackBoxOptim
function EXPsolubility(model::CompositeModel,T)
    sol = sle_solubility(model,101325.,T,[1.,0.];solute=[solute1])
    return sol[2]
end

# Defining solute and solvent, will be fetched from the database later
solute1="lidocaine2013"
solvent1="n-hexane2013"
solvent2="n-heptane2013"
solvent3="ethanol2013"
solvent4="acetone2013"
databasePATH="data/sle_specific"
EXPdataPATH="data/ExpData"

# Defining a composite model
model = CompositeModel([solvent1,solvent2,solvent3,solvent4,solute1];
    fluid=PCSAFT,
    solid=SolidHfus,
    gas=PCSAFT,
    gas_userlocations=[databasePATH],
    fluid_userlocations=[databasePATH],
    solid_userlocations=[databasePATH]
    )

# Creating the Dict for estimation
toestimate = [
    Dict(
        :param => :segment,
        :indices => 5,
        :lower => 1., # 1, for segment term is a good start
        :upper => 9., # 
        :guess => 5.
    ),
    Dict(
        :param => :sigma,
        :indices => (5,5),
        :recombine => false,
        :lower => 1., # start from 1
        :upper => 5., # around 5, never more than 5
        :guess => 3.,
        :factor => 1e-10
    ),
    Dict(
        :param => :epsilon,
        :indices => (5,5),
        :recombine => false,
        :lower => 100., # start at around 100, 
        :upper => 500., # up to 500-600
        :guess => 155.  # around half way of lower bond, 150
    ),
];

estimator,objective,initial,upper,lower = Estimation(model,toestimate,[EXPdataPATH]);

nparams = length(initial)
bounds  = [(lower[i],upper[i]) for i in 1:nparams]

result = BlackBoxOptim.bboptimize(objective; 
                SearchRange = bounds, 
                NumDimensions = nparams,
                # MaxSteps=5,
                MaxTime = 3600*18,
                Method = :adaptive_de_rand_1_bin_radiuslimited,
                PopulationSize = 1000,
                TraceMode=:compact)
        params = BlackBoxOptim.best_candidate(result);

Questions:

For associating fluids, does the :recombine option apply the unlike association calculations automatically using the Wolbach and Sandler cross association rule as shown below?
```
...
:recombine => true,
...
```
$$\epsilon^{A{i}B{j}}=\frac{\epsilon^{A{i}B{i}}+\epsilon^{A{j}B{j}}}{2}$$

$$\kappa^{A{i}B{j}}=\sqrt{\kappa^{A{i}B{i}}\kappa^{A{j}B{j}}}\left[\frac{2\sigma{ii}\sigma{jj}}{\sigma{ii}+\sigma{jj}}\right]^3$$ [ref] From your example calculation Jupyternotebooks, ive realised that you have used $\epsilon{kl}=\frac{\sqrt{\sigma{kk}^3\sigma{ll}^3}}{\sqrt{\sigma{kl}^3}} \sqrt{\epsilon{kk}\epsilon{ll}}$ for associating :recombine for GC methods. Would this also be applied to PCSAFT? Could the user self-define the combining rules?

When assembling the compositemodel, I've realised I have to put the solute in the end of the array model = CompositeModel([solvent1, ..., solute1]). Otherwise this error will occur. This error exists even when I'm trying to define a normal compositemodel.

ERROR: LinearAlgebra.SingularException(1)
Stacktrace:
[1] \(D::LinearAlgebra.Diagonal{Float64, Vector{Float64}}, B::Vector{Float64})
@ LinearAlgebra /USER/julia-1.9.3/share/julia/stdlib/v1.9/LinearAlgebra/src/diagonal.jl:409
[2] \(A::Matrix{Float64}, B::Vector{Float64})
@ LinearAlgebra /USER/julia-1.9.3/share/julia/stdlib/v1.9/LinearAlgebra/src/generic.jl:1107
[3] default_newton_linsolve(d::Vector{Float64}, B::Matrix{Float64}, g::Vector{Float64})
@ NLSolvers ~/.julia/packages/NLSolvers/lRxce/src/quasinewton/approximations/newton.jl:15
[4] solve(problem::NLSolvers.NEqProblem{NLSolvers.VectorObjective{Clapeyron.var"#f!#381"{CompositeModel{PCSAFT{BasicIdeal, Float64}, PCSAFT{BasicIdeal, Float64}, SolidHfus, PCSAFT{BasicIdeal, Float64}, Nothing}, Float64, Float64, Vector{Float64}, Vector{Float64}, Vector{Bool}}, Clapeyron.Solvers.var"#j!#1"{Clapeyron.var"#f!#381"{CompositeModel{PCSAFT{BasicIdeal, Float64}, PCSAFT{BasicIdeal, Float64}, SolidHfus, PCSAFT{BasicIdeal, Float64}, Nothing}, Float64, Float64, Vector{Float64}, Vector{Float64}, Vector{Bool}}, ForwardDiff.JacobianConfig{ForwardDiff.Tag{Clapeyron.var"#f!#381"{CompositeModel{PCSAFT{BasicIdeal, Float64}, PCSAFT{BasicIdeal, Float64}, SolidHfus, PCSAFT{BasicIdeal, Float64}, Nothing}, Float64, Float64, Vector{Float64}, Vector{Float64}, Vector{Bool}}, Float64}, Float64, 1, Tuple{Vector{ForwardDiff.Dual{ForwardDiff.Tag{Clapeyron.var"#f!#381"{CompositeModel{PCSAFT{BasicIdeal, Float64}, PCSAFT{BasicIdeal, Float64}, SolidHfus, PCSAFT{BasicIdeal, Float64}, Nothing}, Float64, Float64, Vector{Float64}, Vector{Float64}, Vector{Bool}}, Float64}, Float64, 1}}, Vector{ForwardDiff.Dual{ForwardDiff.Tag{Clapeyron.var"#f!#381"{CompositeModel{PCSAFT{BasicIdeal, Float64}, PCSAFT{BasicIdeal, Float64}, SolidHfus, PCSAFT{BasicIdeal, Float64}, Nothing}, Float64, Float64, Vector{Float64}, Vector{Float64}, Vector{Bool}}, Float64}, Float64, 1}}}}, Vector{Float64}}, Clapeyron.Solvers.var"#fj!#2"{Clapeyron.var"#f!#381"{CompositeModel{PCSAFT{BasicIdeal, Float64}, PCSAFT{BasicIdeal, Float64}, SolidHfus, PCSAFT{BasicIdeal, Float64}, Nothing}, Float64, Float64, Vector{Float64}, Vector{Float64}, Vector{Bool}}, ForwardDiff.JacobianConfig{ForwardDiff.Tag{Clapeyron.var"#f!#381"{CompositeModel{PCSAFT{BasicIdeal, Float64}, PCSAFT{BasicIdeal, Float64}, SolidHfus, PCSAFT{BasicIdeal, Float64}, Nothing}, Float64, Float64, Vector{Float64}, Vector{Float64}, Vector{Bool}}, Float64}, Float64, 1, Tuple{Vector{ForwardDiff.Dual{ForwardDiff.Tag{Clapeyron.var"#f!#381"{CompositeModel{PCSAFT{BasicIdeal, Float64}, PCSAFT{BasicIdeal, Float64}, SolidHfus, PCSAFT{BasicIdeal, Float64}, Nothing}, Float64, Float64, Vector{Float64}, Vector{Float64}, Vector{Bool}}, Float64}, Float64, 1}}, Vector{ForwardDiff.Dual{ForwardDiff.Tag{Clapeyron.var"#f!#381"{CompositeModel{PCSAFT{BasicIdeal, Float64}, PCSAFT{BasicIdeal, Float64}, SolidHfus, PCSAFT{BasicIdeal, Float64}, Nothing}, Float64, Float64, Vector{Float64}, Vector{Float64}, Vector{Bool}}, Float64}, Float64, 1}}}}}, Clapeyron.Solvers.var"#jv!#3"}, Nothing, NLSolvers.Euclidean{Tuple{0}}, NLSolvers.InPlace}, x::Vector{Float64}, method::NLSolvers.LineSearch{NLSolvers.Newton{NLSolvers.Direct, typeof(NLSolvers.default_newton_linsolve), Nothing, Nothing}, NLSolvers.Backtracking{Float64, Int64, NLSolvers.FixedInterp, NamedTuple{(:lower, :upper), Tuple{Int64, Float64}}}, NLSolvers.InitialScaling{NLSolvers.ShannoPhua}}, options::NLSolvers.NEqOptions{Float64, Int64, Nothing}, state::NamedTuple{(:z, :d, :Fx, :Jx), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float64}, Matrix{Float64}}})
@ NLSolvers ~/.julia/packages/NLSolvers/lRxce/src/nlsolve/linesearch/newton.jl:71
[5] solve
...
[25] bboptimize(functionOrProblem::Function, parameters::Dict{Symbol, Any}; kwargs::Base.Pairs{Symbol, Any, NTuple{5, Symbol}, NamedTuple{(:SearchRange, :NumDimensions, :MaxTime, :Method, :TraceMode), Tuple{Vector{Tuple{Float64, Float64}}, Int64, Int64, Symbol, Symbol}}})
@ BlackBoxOptim ~/.julia/packages/BlackBoxOptim/I3lfp/src/bboptimize.jl:92
[26] bboptimize
@ ~/.julia/packages/BlackBoxOptim/I3lfp/src/bboptimize.jl:91 [inlined]
[27] top-level scope
@ ~/PATH_TO-SCRIPT:8

My last question is regarding temperature dependent k_ij pair interaction term. In the past, I have tried to include a workaround that changes the k value from the PCSAFT_unlike database file every time a specific temperature is called for calculation using kC and kT, the temperature dependent binary interaction constants. But this method will significantly slow down the optimisation process as writing and reading from .csv files has the I/O overhead. Is there a way to directly access the composite model and change its k values from code? Something like model.params.species.k.values = 0.1234.

Thanks again for helping!

Below are the experimental data used for my fittings. Parameter-Estimation.zip

pw0908 commented 8 months ago

Ill take a look. One thing I immediately notice: the paper you refer to for the lidocaine data uses electrolyte PC-SAFT along with reactive equilibrium. This is not accounted for at all in base PC-SAFT. Are you just trying to create a simplified version of the model? Without the effect of pH?

samuel-zhang01 commented 8 months ago

Ill take a look. One thing I immediately notice: the paper you refer to for the lidocaine data uses electrolyte PC-SAFT along with reactive equilibrium. This is not accounted for at all in base PC-SAFT. Are you just trying to create a simplified version of the model? Without the effect of pH?

Hi Pierre, thanks for the quick reply! Yes you are right, i'm ignoring the ePCSAFT dependency on pH and only trying to reproduce pure PCSAFT parameters

samuel-zhang-pfizer commented 7 months ago

ExpData.zip I've realised i forgot to convert solubility data from the mg/g to mol/mol, reuploading...

pw0908 commented 7 months ago

Hi Samuel, sorry for the late response. I am now back from conference and have been able to take a look at your script and made some modifications in the attached zip file: example.zip

Answering your questions directly:

That combining rule within the example notebook refers to the combining rule between dispersive parameters, not associative. We have not yet updated Clapeyron to automatically update association combining rules during parameter estimation.
The reason you're getting that error is because the vector you input in sle_solubility is [1,0] which tells the code that the first component is the solvent (in your case, the lidocaine when you put it as the first component). I've now fixed this in the zip file I've attached.
As of yet, we haven't implemented temperature dependent combining rules. But, when you run the script I've attached, you'll find that the parameters end up being quite close to those obtained in literature, which seems to imply that the combining rule doesn't affect much?

Some brief comments about things I changed in the scripts:

Since we want Clapeyron to generate the cross-association parameters, I had to construct the PC-SAFT model first and then assemble it into CompositeModel.
I've switch the optimiser to Metaheuristics.jl since, in my experience, it's faster than BlackBoxOptim.jl. As long as you install it beforehand, everything should run fine.

ClapeyronThermo / Clapeyron.jl

Parameter Estimation for sle_solubility #224