DanielVandH / ProfileLikelihood.jl

Methods for profile likelihood analysis.
MIT License
18 stars 3 forks source link

Thoughts on 1.0 #95

Open DanielVandH opened 9 months ago

DanielVandH commented 9 months ago

The package needs a bit of love to get it to 1.0. No guarantees on when all of the below can be accomplished, but it should be written down. In addition to the below, the code just needs to be simplified so much - I wrote this package right around when I first started getting serious about Julia, so there are some weird things in the code.

Simplifying `ProfileLikelihoodSolution

I think the result of profile should return some ProfileLikelihoodSolution (or whatever) type object (like there is now), but this object should not have so much contained in it. Instead, users can query it to compute e.g. confidence intervals. Currently, I have

Base.@kwdef struct ProfileLikelihoodSolution{I,V,LP,LS,Spl,CT,CF,OM}
    parameter_values::Dict{I,V}
    profile_values::Dict{I,V}
    likelihood_problem::LP
    likelihood_solution::LS
    splines::Dict{I,Spl}
    confidence_intervals::Dict{I,ConfidenceInterval{CT,CF}}
    other_mles::OM
end

I think we can remove splines and confidence_intervals, leaving

Base.@kwdef struct ProfileLikelihoodSolution{I,V,LP,LS,OM}
    parameter_values::Dict{I,V}
    profile_values::Dict{I,V}
    likelihood_problem::LP
    likelihood_solution::LS
    other_mles::OM
end

I think the types should also not be so generic, simplifying the design a bit. The confidence intervals and splines can then be computed using some other function like get_confidence_interval(sol, ...), which wouldn't be a breaking change necessarily since that is already the interface (which I have failed to properly define, though). Similarly for splines.

Simplifying individual steps

There should also be some Profiler struct (or some better name) which is used to take individual steps when profiling. This would make it easier to debug, and to also solve problems like #91 much easier. This would still be a bit restrictive though, since what if eventually we want to implement a method that returns only the confidence limits rather than the complete profile. I won't worry about this last point, though, since that's not on the horizon - let's worry about that when (or if) it gets there.

Simplifying the main functions mle and profile

profile takes quite a lot of keyword arguments. I think I can simplify it down a lot and just enforce some specific defaults. I worry it might be a bit overwhelming the way it is currently.

Simplifying how LikelihoodProblem is defined

Currently, LikelihoodProblems are constructed with an awkward mix of ProfileLikelihood.jl specific kwargs and Optimization.jl kwargs (via f_kwargs and prob_kwargs). Maybe I should just require that users define their own OptimizationProblem within Optimization.jl? Perhaps a LikelihoodFunction is needed so that I can use dispatch on OptimizationProblem to make the appropriate conversion.

Simplifying how likelihoods from ODEs are defined

The method for constructing a LikelihoodProblem for a differential equation is a bit wack. I designed it before I properly understood the SciML interface - I should just be implementing an init method or something.

DanielVandH commented 9 months ago

I guess it could also be good to come back to #70 and #71 at some point.

TorkelE commented 9 months ago

Some thoughts:

Working with OpimtizationProblems directly

I agree that it would make sense to support the user defining OptimizationProblems using Optimization. Technically these do not need to be maximum likelihood problems (but the same principle applies, although we typically wish to find minimums, not maximums).

Generally, there are lots of packages that are good at generating various types of cost function/optimization problems. Examples includes DiffEqParamEstim.jl and PEtab.jl (the latter of which now also supports the creation of SciMLOptimization-type OptimizationProblems).

I think it is not bad to have the functionality of generating such problems here as well. However, I imagine loads of people who'd use this package already have a way of generating their cost functions/likelihood problems/optimization problems. If ProfileLiklihood.jl had good support for using these problems directly, I think it would be convenient for many.

Basically, there are a lot of packages for creating and maximising likelihoods (and do it well), however, there are few (none working properly) for computing (likelihood) profiles. I think it would make more sense to focus on the latter (creating the package for this) than also trying to compete on the former front as well (having this functionality is not directly bad though, but I would not focus on it).

Plotting with Plots.jl

Would it make sense to create an extension for using Plots.jl for plotting? Currently, Plots.jl is what is used for plotting within the SciML ecosystem (which includes Optimization). I think Makie.jl is more potent (and might even surpass Plots generally, who know). However (especially for movie Julia users), just running Plots is generally easier to work with and understand. It would be easy to create a dispatch for Plots plot function, and then you can run plot(profile) and get something useful.

Documentation intro page

Would it make sense to have the intro page be a bit "math-lighter" (when I first opened the package and saw the intro, my instinct was to just close it again)? I think it depends on what community you are from, I work a lot with systems biologists and chemists who are less comfortable when math gets heavy. The introduction is basically a ton of equations. For simply introducing the package, maybe have a simple example and describe what you are dealing with, and then put a math background later on? This depends on who you want to appeal to though.

Running documentation code when it is built

In e.g. Catalyst we enclose doc code in blocks like this:

```@example ex1
length = 2.0
width = 4.0
area = length*width
`` # Should be triple, but that messes up .md formatting.

Then, when the docs are built, this code is run and the output is automatically computed. This has two advantages:

Other comments

DanielVandH commented 9 months ago

Thanks very much for the input @TorkelE! I'll try to keep it in mind once I eventually get into the rework. Regarding Plots.jl: I've never used Plots.jl, so I probably won't be the one to make that extension, but I do agree that it'd be nice to have.

TorkelE commented 9 months ago

If we get there, and if you don't min having one, I would be happy to write the Plots extension.

sebapersson commented 9 months ago

Some additional thoughts (I maintain PEtab.jl which allows users to set up likelihood parameter estimation problems for ODE models, and currently a key functionality lacking in the modeling pipeline is practical identifiability analysis, and this package looks great for this purpose!)

Working with OpimtizationProblems directly

I agree with Torkel, would be great to support. But also agree it would also be nice to in addition to have the interface where people themselves can provide likelihood, gradient of the likelihood (autodiff directly on the objective function is not always the most efficient approach), bounds etc... themselves.

Plotting with Plots.jl

Fully agree with Torkel here.

Documentation intro page

Again agree with Torkel, I think the math details could be moved into the documentation, as it is great to have for those who are interested.

Adaptive step-size in profile solution

I do not exactly know how the profiles are computed in the package, but, generally when computing the profile each step is expansive as for each step an optimization problem must be solved. Therefore some packages like pyPESTO provide options for adaptive stepping as default (I think via regerssion). For larger ODE-models this would be great for reducing runtime. I could potentially help with this down the line.

Other comments

Here I agree with Torkel.

DanielVandH commented 9 months ago

@sebapersson:

But also agree it would also be nice to in addition to have the interface where people themselves can provide likelihood, gradient of the likelihood (autodiff directly on the objective function is not always the most efficient approach), bounds etc... themselves.

Would this be covered by just making the user provide the OptimizationProblem themselves? This will contain all this information (with OptimizationFunction containing the gradient that can be user-provided in Optimization.jl).

I do not exactly know how the profiles are computed in the package, but, generally when computing the profile each step is expansive as for each step an optimization problem must be solved.

Currently it is all just equally spaced. I did try and get some adaptive stepping working but never got it working. Some help here would be very good - indeed, simple stepping is incredibly slow for large ODEs...

sebapersson commented 9 months ago

Providing a custom gradient is covered by providing OptimizationProblem - as when creating it (or rather the OptimizationFunction) the user can set the gradient (for example in PEtab we do this when converting PEtabODEProblem to OptimizationProblem). And, if users instead just want to provide the likelihood, gradient can be an optional argument (with default nothing making OptimizationProblem use autodiff).

Yes, would be happy to help with the adaptive part once the code gets closer to 1.0 :)

DanielVandH commented 1 month ago

Sorry for the lack of progress here @sebapersson and @TorkelE. Still not sure I'll ever be able to find much time to get to these changes. I don't really have much use for this package anymore unfortunately.

If either of you still had a need for this package, and are still interested in some of these changes, I'm happy to give you permissions and help with any changes / questions. Things might get stale otherwise. No worries if not but figured I'd give you a heads up rather than disappearing from the discussion entirely even longer 😅