Eawag-SIAM / SimulatedAnnealingABC.jl

Approximate Bayesian Computation algorithm based on simulated annealing
GNU General Public License v3.0
1 stars 0 forks source link

Wrong documentation of `f_dis`t and alg types #23

Open scheidan opened 1 month ago

scheidan commented 1 month ago

Currently the documentation about the properties of f_dist is wrong: :single_eps does handle multiple distances.

We should clarify the difference between this approaches:

  1. Single dist with manual aggregation: f_dist(theta) = aggregate( dist_1(S_1(f(theta)_1, S_1(Yobs_1)), ..., dist_K(S_K(f(theta)_K, S_K(Yobs_K))). This is the "traditional" ABC way. Should we use :single_eps or :multi_eps ?
  2. Multi dist and :single_eps: f_dist(theta) = [ dist_1(S_1(f(theta)_1, S_1(Yobs_1)), ..., dist_K(S_K(f(theta)_K, S_K(Yobs_K)) ]. Does this algorithm weight the different statistics intelligently or simpli based on the prior (via the cdf transformation)
  3. Multi dist and :multi_eps: f_dist(theta) = [ dist_1(S_1(f(theta)_1, S_1(Yobs_1)), ..., dist_K(S_K(f(theta)_K, S_K(Yobs_K)) ]. This does weight the statistics individually.

What should we recommend. Maybe if we have fairly informative prior, 2) and otherwise 3)?

The example needs probably an update too

scheidan commented 1 month ago

Just for documentation, this is the old and wrong descriptions:

## Details on `f_dist`

Given the observation ``D`` and a stochastic model ``f(θ)`` that provides a random sample from the likelihood ``p(D|θ)``, the user provided function `f_dist` must be defined in one of the following ways depending on the algorithm used.

###  `algorithm = :single_eps`

In this case `sabc` expects that `f_dist` returns a positive scalar defined as:

```f_dist(θ) = d(f(θ), D)```

where  ``d()`` is a distance function. In practice often we want to compute this distance between summary statistics. If ``s()`` computes one or multiple statistics of the data, `f_dist` becomes:

```f_dist(θ) = d(s(f(θ)), s(D))```

Note, in this situation it is up to the user to ensure that the distance function ``d`` weights the different summary statistics meaningfully.

###  `algorithm = :multi_eps`

With the multi epsilon we can leave the weighting to the sampling algorithm. This introduces a seperate tolerance for each statistics. If we have `K` summary statistics, `f_dist` must return `K` distances:

```f_dist(θ) = [ d_i(S_i(f(θ)), S_i(D)) for i in 1:k ]```
scheidan commented 1 month ago

We should add something like this:

1 statistic n statistics
single epsilon good (same as old SABC) good
multi epsilon don't use good