Towards a full fledged powerful ABC package

jbrea commented 4 years ago

Introduction

There are currently a few different and unrelated packages for Approximate Bayesian Computation and Likelihood-Free Inference in julia. As mentioned on discourse, it may be nice to coordinate ABC efforts in julia a bit; at least I would enjoy this, :smile:. In the following I try to give a brief overview of the current state. I spent limited time on reviewing the packages. Apologies if I missed something and please correct all my mistakes! After that, I make a few propositions.

Current State

ApproximateBayesianComputing.jl @eford

Methods

ABC-PMC, Beaumont et al. 2002

API

model(params) = ...
setup = method_plan(model, compute_summary_statistics, metric, prior; kwargs...)
result = run_abc(setup, data; kwargs...)

Features

Additional Distributions:
- GaussianMixtureModelCommonCovar
- GaussianMixtureModelCommonCovarTruncated
- GaussianMixtureModelCommonCovarSubset
- GaussianMixtureModelCommonCovarDiagonal
- MultiUniform
- LinearTransformedBeta
- GenericCompositeContinuousDist
parallel evaluation (?)
Gaussian Processes emulation

GPABC.jl @tanhevg

Methods

Rejection ABC
ABC-SMC (Toni et al. 2009)
Emulated ABC-Rejection
Emulated ABC-SMC
ABC model selection (Toni et al. 2010)

API

model(params) = ...
result = method(data, model, prior, args...; kwargs...)

Features

Plotting recipes
Stochastic Linear Inference (LNA)
Custom GP implementation (see e.g. GaussianProcesses.jl for an alternative)
Utilities to compute summary statistics

ApproxBayes.jl @marcjwilliams1

Methods

Rejection ABC
ABC-SMC (Toni et al. 2009)
ABC model selection (Toni et al. 2010)

API

model(params, constants, targetdata) = ...
setup = method(model, args..., prior)
result = runabc(setup, data; kwargs...)

Features

Plotting recipes
Composite Prior
Custom distance ksdist
multi-threading

KissABC.jl @francescoalemanno

Methods

Rejection ABC
ABC-SMC (Drovandi et al. 2011)
ABC-DE (Turner and Sederberg 2012)
Kernelized ABC-DE

API

model(params, constants) = ...
setup = ABCPlan(prior, model, data, metric)
result = method(setup, kwargs...)

Features

Factored Distribution
parallel evaluation (multi-threading)

LikelihoodfreeInference.jl (myself)

Methods

PMC-ABC (Beaumont et al. 2002)
Adaptive SMC (Del Moral et al. 2012)
K2-ABC (Park et al. 2016)
Kernel ABC (Fukumizu et al. 2013)
Approximative Maximum A Posterior Estimation
- Kernel Recursive ABC (Kajihara et al. 2018)
- Point estimators inspired by Bertl et al. 2017 (Kernel), Jiang et al. 2018 (KL-Divergence), Briol et al. 2019 (Maximum Discrepancy Distance), Székely and Rizzo (Energy Distance)

API

model(params) = ...
setup = method(prior = ..., kwargs...)
result = run!(setup, model, data; kwargs...)

Features

Additional Distributions
- MultivariateUniform
- TruncatedMultivariateNormal
extensions of corrplot and histogram

Propositions

There is a little bit of overlap between the packages, but overall they seem fairly complementary. However, from a user perspective I think it would be awesome, if there would be a common API, such that one can easily switch between the different packages. I imagine in particular one way to define priors, models, metrics and fitting.

ABCBase.jl: a common API and some basic utilities

My proposition here is to write together a very light-weight ABCBase.jl package that serves as a primary dependency of ABC packages. See for example DiffEqBase.jl or ReinforcementLearningBase.jl for how this is done in other eco-systems. I would include in ABCBase.jl

Ingredients

everything related to prior distributions
everything related to summary statistics
everything related to metrics
testing (and possibly assertion) utilities
a well-written documentation of the common API

API

My proposition for the API is the following (I am biased of course, and I am very open to discussion!)

Additional to everything related to priors, summarys stats and metrics, ABCBase.jl exports a function fit! with the following signature

fit!(setup, model, data; verbosity = 0, callback = () -> nothing, rng = Random.GLOBAL_RNG)

Every ABC package that relies on ABCBase.jl extends this fit! function, e.g.

ABCBase.fit!(method::RejectionABC, model, data; kwargs...) = "blabla"

The user provides models as callable objects (functions or functors) with one argument. Constants are recommended to be handled with closures. Extraction of summary statistics is done in the model. For example

model(params) = "do something with params"

my_complex_model(params, constants) = "do something with params and constants"
model(params) = let constants = "blabla" my_complex_model(params, constants) end

my_raw_model(params) = "returns some raw data"
model(params) = extract_summary_statistics(my_raw_model(params))

struct MyFunctorModel
    options
end
(m::MyFunctorModel)(params) = "do something with m and params"

ABC methods/plans/setups are specified in the form

setup = method(metric = my_metric, kwargs...)
setup = method(prior = my_prior, kwargs...) # if method has a prior

One master packages to access all methods

Similar in spirit to DifferentialEquations.jl we could create one package that aggregates all packages and gives unified access. The dependency graph would be something like

            ABCBase.jl
                |
     -----------------------
    |           |          |
 ABCPkg1     ABCPkg2      etc.
    |           |          |
    ------------------------
                |
              ABC.jl

This package does nothing but reexport all the setups/methods defined in the different packages and the fit! function. The name of this package should of course be discussed.

ABCProblems.jl

I think it would be nice to have a package with typical ABC benchmark problems, like the stochastic lotka-volterra problem, the blowfly problem etc. Maybe we could collect them in a package ABCProblems.jl.

New methods to be implemented

Here is an incomplete list of methods that I would love to see implemented in julia. Together with a collection of benchmark problems one would get a nice box to benchmark new methods we do research on.

Conclusions and Questions

Who would be up for such a collaborative effort? How do you like my proposition for ABCBase.jl? What would you change? Shall we create ABCBase.jl, ABCProblems.jl and ABC.jl? Or something similar with different names?

francescoalemanno commented 4 years ago

I'm also in favour of unifying efforts, a first step can be whipping up ABCBase.jl and opening an organization to eventually contain the whole family of packages (if it can be done for free)

jbrea commented 4 years ago

That's a good idea. Yes, there are free organizations. Shall I create one with name ABCJulia or JuliaApproxBayes? (JuliaABC is taken, unfortunately)

francescoalemanno commented 4 years ago

JuliaApproxBayes sounds nice, what about JuliaApproxInference

jbrea commented 4 years ago

Done :smile: And since I liked the name I just added the (currently empty) ApproxInferenceBase.jl package.

williams1 commented 4 years ago

You want @marcjwilliams1 not @williams1. That's not my project.

On Mon, Jun 15, 2020, 11:19 AM jbrea notifications@github.com wrote:

Done https://github.com/JuliaApproxInference 😄 And since I liked the name I just added the (currently empty) ApproxInferenceBase.jl https://github.com/JuliaApproxInference/ApproxInferenceBase.jl package.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbrea/LikelihoodfreeInference.jl/issues/5#issuecomment-644233325, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJHDTO2MNFMWFA6NM3RFKLRWZC2LANCNFSM4N6E3SPA .

eford commented 4 years ago

I agree that a unified interface would be nice for users. Thanks for offering to work on this.

I'd be happy to merge in, non-breaking pull requests to make ApproximateBayesianComputing.jl compatible with a new interface. And/or I'd be happy for any code from our repo to be borrowed or adapted into another package.

As researchers, it's not realistic for our group to contribute substantial time to tinkering with interfaces or reworking codes due to interface changes. If there need to be minor breaking changes, then I'd ask that someone work through the inevitable hickups and let me know once they expect things have stabilized. At that point, I could look into how much would be required for us to update our codes that depend on it and consider merging them in.

A few minor point:

IMHO, most of our additional distributions in ApproximateBayesianComputing.jl (e.g., GaussianMixtureModelCommonCovar*, MultiUniform, LinearTransformedBeta) would probably be better off if they were housed in either Distributions.jl or some other package that provides bonus distributions. (Back when we wrote our code packages like Distributions.jl were still changing too fast to spend time making it fit their mold.) It would be great if they were useful to others.

I would have thought that lots of things like Distributions and Distance functions would be best provided by existing packages. I'm not sure I see the benefit of having everything go through ABCBase.jl. In my experience, more dependencies means a higher rate of depending on code that isn't maintained/updated, leading to our team wasting time to rewrite around the old dependency. Additionally, I would prefer not to have to deal with hassles of more complicated namespaces. So my vote is to minimize unnecessary dependencies.

Personally, I don't like the choice of "fit!" for running a simulation. In ABC we typically perform a simulation to construct an approximation to the target distribution, rather than performing a fit. Given how many other packages use "fit!" in ways that seem inappropriate to me, I'm guessing that this may be one of those cases where folks coming from CS/ML backgrounds use terminology differently than the stats community? If so, then I won't put up a fight.

Good luck.

On Mon, Jun 15, 2020 at 8:57 AM jbrea notifications@github.com wrote:

Introduction

There are currently a few different and unrelated packages for Approximate Bayesian Computation and Likelihood-Free Inference in julia. As mentioned on discouse https://discourse.julialang.org/t/ann-rfc-kissabc-jl-sequential-montecarlo-approximate-bayesian-computation/40668/16, it may be nice to coordinate ABC efforts in julia a bit; at least I would enjoy this, 😄. In the following I try to give a brief overview of the current state. I spent limited time on reviewing the packages. Apologies if I missed something and please correct all my mistakes! After that, I make a few propositions. Current State ApproximateBayesianComputing.jl https://github.com/eford/ApproximateBayesianComputing.jl @eford https://github.com/eford Methods

ABC-PMC, Beaumont et al. 2002

API

model(params) = ...

setup = method_plan(model, compute_summary_statistics, metric, prior; kwargs...)

result = run_abc(setup, data; kwargs...)

Features

Additional Distributions:

GaussianMixtureModelCommonCovar

GaussianMixtureModelCommonCovarTruncated

GaussianMixtureModelCommonCovarSubset

GaussianMixtureModelCommonCovarDiagonal

MultiUniform

LinearTransformedBeta

GenericCompositeContinuousDist

parallel evaluation (?)

Gaussian Processes emulation

GPABC.jl https://github.com/tanhevg/GpABC.jl @tanhevg https://github.com/tanhevg Methods

Rejection ABC

ABC-SMC (Toni et al. 2009)

Emulated ABC-Rejection

Emulated ABC-SMC

ABC model selection (Toni et al. 2010)

API

model(params) = ...

result = method(data, model, prior, args...; kwargs...)

Features

Plotting recipes

Stochastic Linear Inference (LNA)

Custom GP implementation (see e.g. GaussianProcesses.jl https://github.com/STOR-i/GaussianProcesses.jl for an alternative)

Utilities to compute summary statistics

ApproxBayes.jl https://github.com/marcjwilliams1/ApproxBayes.jl @williams1 https://github.com/williams1 Methods

Rejection ABC

ABC-SMC (Toni et al. 2009)

ABC model selection (Toni et al. 2010)

API

model(params, constants, targetdata) = ...

setup = method(model, args..., prior)

result = runabc(setup, data; kwargs...)

Features

Plotting recipes

Composite Prior

Custom distance ksdist

multi-threading

KissABC.jl https://github.com/francescoalemanno/KissABC.jl @francescoalemanno https://github.com/francescoalemanno Methods

Rejection ABC

ABC-SMC (Drovandi et al. 2011)

ABC-DE (Turner and Sederberg 2012)

Kernelized ABC-DE

API

model(params, constants) = ...

setup = ABCPlan(prior, model, data, metric)

result = method(setup, kwargs...)

Features

Factored Distribution

parallel evaluation (multi-threading)

LikelihoodfreeInference.jl https://github.com/jbrea/LikelihoodfreeInference.jl (myself) Methods

PMC-ABC (Beaumont et al. 2002)

Adaptive SMC (Del Moral et al. 2012)

K2-ABC (Park et al. 2016)

Kernel ABC (Fukumizu et al. 2013)

Approximative Maximum A Posterior Estimation

Kernel Recursive ABC (Kajihara et al. 2018)

Point estimators inspired by Bertl et al. 2017 (Kernel), Jiang et al. 2018 (KL-Divergence), Briol et al. 2019 (Maximum Discrepancy Distance), Székely and Rizzo (Energy Distance)

API

model(params) = ...

setup = method(prior = ..., kwargs...)

result = run!(setup, model, data; kwargs...)

Features

Additional Distributions

MultivariateUniform

TruncatedMultivariateNormal

extensions of corrplot and histogram

Propositions

There is a little bit of overlap between the packages, but overall they seem fairly complementary. However, from a user perspective I think it would be awesome, if there would be a common API, such that one can easily switch between the different packages. I imagine in particular one way to define priors, models, metrics and fitting. ABCBase.jl: a common API and some basic utilities

My proposition here is to write together a very light-weight ABCBase.jl package that serves as a primary dependency of ABC packages. See for example DiffEqBase.jl https://github.com/SciML/DiffEqBase.jl or (ReinforcementLearningBase.jl]( https://github.com/JuliaReinforcementLearning/ReinforcementLearningBase.jl ) for how this is done in other eco-systems. I would include in ABCBase.jl Ingredients

everything related to prior distributions

everything related to summary statistics

everything related to metrics

testing (and possibly assertion) utilities

a well-written documentation of the common API

API

My proposition for the API is the following (I am biased of course, and I am very open to discussion!)

Additional to everything related to priors, summarys stats and metrics, ABCBase.jl exports a function fit! with the following signature

fit!(setup, model, data; verbosity = 0, callback = () -> nothing, rng = Random.GLOBAL_RNG)

Every ABC package that relies on ABCBase.jl extends this fit! function, e.g.

ABCBase.fit!(method::RejectionABC, model, data; kwargs...) = "blabla"

The user provides models as callable objects (functions or functors) with one argument. Constants are recommended to be handled with closures. Extraction of summary statistics is done in the model. For example

model(params) = "do something with params"

my_complex_model(params, constants) = "do something with params and constants" model(params) = let constants = "blabla" my_complex_model(params, constants) end

my_raw_model(params) = "returns some raw data" model(params) = extract_summary_statistics(my_raw_model(params))

struct MyFunctorModel
options
end

(m::MyFunctorModel)(params) = "do something with m and params"

ABC methods/plans/setups are specified in the form

setup = method(metric = my_metric, kwargs...)

setup = method(prior = my_prior, kwargs...) # if method has a prior

One master packages to access all methods

Similar in spirit to DifferentialEquations.jl https://github.com/SciML/DifferentialEquations.jl we could create one package that aggregates all packages and gives unified access. The dependency graph would be something like
        ABCBase.jl

            |

 -----------------------

|           |          |
ABCPkg1 ABCPkg2 etc.
|           |          |

------------------------

            |

          ABC.jl
This package does nothing but reexport all the setups/methods defined in the different packages and the fit! function. The name of this package should of course be discussed. ABCProblems.jl

I think it would be nice to have a package with typical ABC benchmark problems, like the stochastic lotka-volterra problem, the blowfly problem etc. Maybe we could collect them in a package ABCProblems.jl. New methods to be implemented

Here is an incomplete list of methods that I would love to see implemented in julia. Together with a collection of benchmark problems one would get a nice box to benchmark new methods we do research on.

Expectation Propagation for Likelihood-Free Inference, Barthelmé & Chopin 2014 http://dx.doi.org/10.1080/01621459.2013.864178

Parameter estimation in hidden markov models with intractable likelihoods using sequential monte carlo, Yıldırım et al. 2015 https://www.tandfonline.com/doi/pdf/10.1080/10618600.2014.938811

Approximate Bayesian computation with the Wasserstein distance, Bernton et al. 2019 http://dx.doi.org/10.1111/rssb.12312

Statistical Inference for Generative Models with Maximum Mean Discrepancy, Briol et al. 2019 https://arxiv.org/abs/1906.05944

Approximate Bayesian Computation with Kullback-Leibler Divergence as Data Discrepancy, Jian et al. 2018 http://proceedings.mlr.press/v84/jiang18a.html

Automatic Posterior Transformation for Likelihood-Free Inference, Greenberg et al. 2019 http://proceedings.mlr.press/v97/greenberg19a.html

Conclusions and Questions

Who would be up for such a collaborative effort? How do you like my proposition for ABCBase.jl? What would you change? Shall we create ABCBase.jl, ABCProblems.jl and ABC.jl? Or something similar with different names?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbrea/LikelihoodfreeInference.jl/issues/5, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKDFQX7YZX2C5DT3AMJRULRWYLE5ANCNFSM4N6E3SPA .

-- Eric Ford Professor of Astronomy & Astrophysics Institute for Computational & Data Sciences Center for Exoplanets & Habitable Worlds Center for Astrostatistics Penn State Astrobiology Research Center Pennsylvania State University

jbrea commented 4 years ago

Thanks @eford for your feedback!

I think it is a very good idea to move as much as possible to Distributions and Distance. Let's open some PR's there! (see also here).

I hesitated when suggesting fit! originally, for the same reasons you brought up. In my package I currently use run!. Would you be fine with run! instead of fit!?

fipelle commented 2 years ago

Not sure how active this thread is, but I have started working with similar methods and I am trying to implement ABC via Turing. Of course, it lacks a series of tools available in the packages mentioned above - which is where the space for those packages may be - but it is quite handy and maintained. Are you perhaps considering looking into that too?

eford commented 2 years ago

Distance

Thanks @eford for your feedback!

I think it is a very good idea to move as much as possible to Distributions and Distance. Let's open some PR's there! (see also here).

I hesitated when suggesting fit! originally, for the same reasons you brought up. In my package I currently use run!. Would you be fine with run! instead of fit!?

Yes, I think it makes sense to run! an ABC simulation. (Sorry for not noticing this for 2 years!)

eford commented 2 years ago

Not sure how active this thread is, but I have started working with similar methods and I am trying to implement ABC via Turing. Of course, it lacks a series of tools available in the packages mentioned above - which is where the space for those packages may be - but it is quite handy and maintained. Are you perhaps considering looking into that too?

I very much appreciate Turing's PPL and easy integration with MCMC samplers.
I agree that it "should" be practical to reusing Turing's PPL within an ABC context.
But I'm curious what your motivation is. Is it mostly for making easy comparisons or for pedagogical purposes? Or something else? I would have thought that the main benefit of ABC would be in contexts where we have a detailed forward model, but it's not practical to express that in a PPL.

jbrea commented 2 years ago

@fipelle thanks for reviving this thread. My plans were actually not to use Turings PPL but rather its "backend" AbstractMCMC.jl, in a similar way as it is done in KissABC. I haven't found the time yet to continue on this, but I really would love to have a good ecosystem that allows us to compare the different approximate inference methods based on MCMC, Kernels, Optimal Transport, etc., both for point estimation and posterior estimation.

fipelle commented 2 years ago

Replies

But I'm curious what your motivation is. Is it mostly for making easy comparisons or for pedagogical purposes? Or something else?

@eford I would love to have access to the samplers and - more broadly - its backend. They now have a SMC implementation too which may be a nice starting point for more advanced ABC applications. In my case, it is cheeper to evaluate the summary statistics than the likelihood due to data related issues.

My plans were actually not to use Turings PPL but rather its "backend" AbstractMCMC.jl, in a similar way as it is done in KissABC

@jbrea this is also a good idea. For now, I am using Turing PPL directly and approaching the problem by defining the faux distribution:

using Distributions;

struct UnknownContinuousDistribution <: ContinuousUnivariateDistribution
    summary_statistics_value::Real
end

# Julia cannot sample from an unknown distribution
Distributions.rand(rng::AbstractRNG, d::UnknownContinuousDistribution) = nothing;

# While the pdf is also unknown, a good summary statistics should be able to proxy it to some extent - the latter is computed externally within a @model macro, and stored in `d`
Distributions.logpdf(d::UnknownContinuousDistribution, x::Real) = d.summary_statistics_value;

I then define the summary statistics within a Turing model and let my data

y ~ UnknownContinuousDistribution(summary_statistics_value)

where summary_statistics_value is to be maximised. Of course, this works only when conditioning on the data since you cannot sample pseudo-random numbers from an unknown distribution. I am not sure though how to use Turing for situations in which online learning is key as highlighted here. Ideally, it would be nice to implement something like this.

Development

If you like this approach one way forward could be polishing the faux distribution, create a discrete equivalent and write up a series of shortcuts to simplify online learning. The packages in this environment could have specialised approaches to deal with online learning, proposals and ways for computing summary statistics - perhaps something similar to what @jbrea mentioned on top such as ABC.jl, ABCBase.jl, ABCSummaryStatistics.jl and ABCSequential.jl. In Python there is also ABCpy which is quite broad and may have functions worth implementing.

jbrea commented 2 years ago

@fipelle Do you have already some code publicly available where you use this approach? It would be interesting to see it "in action".

Please let me know, if you want to move a package to this org or you want to discuss API questions.

fipelle commented 2 years ago

@jbrea I will release a small package in the next few days with a few working examples, so that you can see it in action

fipelle commented 2 years ago

I forgot to mention that I would love discussing APIs especially for SMC. I have seen different implementations on Julia and I am not sure which one is the most up-to-date (wrt to Julia). I will take a look at KissABC to see how they have implemented AbstractMCMC.jl. It would be great if you could write / direct me to a compact example of SMC usage - either through Turing.jl, AbstractMCMC.jl or AdvancedPS.jl - with online learning in mind.

jbrea commented 2 years ago

@fipelle sorry for the delay. Unfortunately I don't know a compact example of SMC usage with online learning in mind. However, did you already have a look at this discussion? It's a bit outdated but I think the key ideas still apply today and may be relevant.

JuliaApproxInference / LikelihoodfreeInference.jl