SciML / EasyModelAnalysis.jl

High level functions for analyzing the output of simulations
MIT License
81 stars 14 forks source link

Simple fitting example #197

Closed alex-s-gardner closed 1 year ago

alex-s-gardner commented 1 year ago

I'd like to start working in the SciML echo system... it's incredibly impressive.

One thing that I would like to do, that I expect many others entering the echo system might also start at, is to optimize the coefficients (parameters) of a model to best fit observations. I have been looking through the EasyModelAnalysis.jl, Optimizations.jl and ModelingToolkit.jl documentation and I'm struggling to wrap my head around the echo system to solve my problem... I'm not the fastest learner but I'm certainly not the slowest. I think most of the barrier is in the nomenclature. I think a simple-relatable example would help greatly and I'd be happy to make a PR if I can get some guidance on how to do the task:

I have observations of a seasonal process that can be well explained using a "tilted" sine wave with offset and trend:

eqn(t,p) =  p[1] .+ p[2] * t .+ p[3] * (1 / p[5]) * atan.(p[5] * sin.(2.0 * pi * (t .+ p[4])) ./ (1 .- p[5] * cos.(2.0 * pi * (t .+ p[4]))))

where t is time and p are the model coefficients. And I can create synthetic observations:

# time at which observations are made
t = collect(0:0.001:10) 

# model coefficients  that I'd like to invert for given t and y
p = [0.0, 0.0, 1, 0.2, 0.9] 

# synthetic observations (model + noise)
y = eqn(t, p) .+ rand(length(t));

We can now visualize the observations:

using Plots
scatter(t, y)
Screenshot 2023-07-21 at 3 42 41 PM

Now, how can I use EasyModelAnalysis.jl to invert for the coefficients p that best fit the observations?

ChrisRackauckas commented 1 year ago

Hey, the fitting methods in EasyModelAnalysis were designed for ODE models. We could add a simple curvefit function. For now you can just directly use Optimization.jl. This would look like:

using Optimization, OptimizationPolyalgorithms, ForwardDiff

eqn(t,p) =  p[1] .+ p[2] * t .+ p[3] * (1 / p[5]) * atan.(p[5] * sin.(2.0 * pi * (t .+ p[4])) ./ (1 .- p[5] * cos.(2.0 * pi * (t .+ p[4]))))
function cost(p,_)
    predicted = map(t->eqn(t, p), t_data)
    sum(abs2,predicted .- data) / length(data)
end
optf = OptimizationFunction(cost, Optimization.AutoForwardDiff())

prob = OptimizationProblem(optf, pguess)
sol = solve(prob, PolyOpt())
sol.u # this is the best `p`
alex-s-gardner commented 1 year ago

Thanks a ton Chris... the EasyModelAnalysis read me has "Please find the parameters that best fit the model to the data." But I guess this was intended to only pertain ODEs?

alex-s-gardner commented 1 year ago

Thanks again... just as an FYI, I implemented the version above that you suggested then benchmarked against LsgFit.jl and find that for this example LsgFit was 5X faster out of the box. No judgement of Optimization.jl as I suspect under the hood much of the same code is being shared between packages.

## Use optimization.jl
using Optimization, OptimizationPolyalgorithms, ForwardDiff
p0 = [0.1, 0.1, 1, 0.1, 0.1]

function cost(p0, _)
    sum(abs2, eqn.(t, Ref(p0)) .- y) / length(y)
end

using BenchmarkTools
@btime begin
    optf = OptimizationFunction($cost, Optimization.AutoForwardDiff())

    prob = OptimizationProblem(optf, $p0)
    sol = solve(prob, PolyOpt())
end
 949.181 ms (32553 allocations: 194.14 MiB)
# use LsqFit
using LsqFit
@btime fit = curve_fit($eqn, $t, $y, $p0)
145.470 ms (4050 allocations: 150.14 MiB)
ChrisRackauckas commented 1 year ago

LsqFit is fast but generally inaccurate, which is why I generally avoid it. But if it works for a problem then it will be efficient.

alex-s-gardner commented 1 year ago

@ChrisRackauckas thanks... hope to dive deeper into the echo system in the coming months.