A few questions about pyABC

Gabriel-p commented 11 months ago

Sorry to post this as an issue, but I have a few questions about how to use pyABC properly.

What is the proper way of extracting summary statistics (i.e.: mean, stddev, median, percentiles) for my model parameters once the sampler is finished?

Should I use the last iteration stored? Something like this perhaps:

history = abc.run(max_nr_populations=max_nr_populations,  max_walltime=dt.timedelta(hours=0, minutes=max_mins))
max_r = history.max_t
# Select the last iteration
df, w = history.get_distribution(t=max_r)
# Extract mean and STDDEV for the 'param_0' model parameter
med, stddev = np.mean(df['param_0]), np.std(df['param_0])

I could not find a way to extract these values using a pyabc method.

Is there a way to stop the process if the acceptance rate gets too low?

In my tests I usually get a decreasing acceptance rate that makes the sampler get stuck for a long while trying to find models below the current eps. Can I tell the sampler to break out if the acceptance rate drops below a given value, for example 0.05?

Should I worry about very low ESS values?

Sometimes I find that the last ESS value (i.e.: the value for the final t) is very low, even reaching single digits. Is this something that I should be worried about?

arrjon commented 11 months ago

What is the proper way of extracting summary statistics (i.e.: mean, stddev, median, percentiles) for my model parameters once the sampler is finished?

You can work with the samples of the last population by calling history.get_distribution(), which will return the samples and weights of your last population. Then there are the pyabc.weighted_statistics which you can use to compute means, medians etc since simply computing the statistic of your samples does not respect the weights of each sample.

Is there a way to stop the process if the acceptance rate gets too low?

You can define multiple stopping criterions when calling abc.run(), in particular min_acceptance_rate, but also

minimum_epsilon: Stop if epsilon is smaller than minimum epsilon specified here.
            Defaults in general to 0.0, and to 1.0 for a Temperature epsilon.
max_nr_populations:
            Maximum allowed number of populations.
            Stop if this number is reached.
min_acceptance_rate:
            Minimal allowed acceptance rate. Sampling stops if a population
            has a lower rate.
max_total_nr_simulations:
            Maximum allowed total number of evaluations.
max_walltime:
            Maximum allowed walltime since start of the run() method
min_eps_diff:
            Minimum epsilon difference in two sequential generations.

Should I worry about very low ESS values?

ESS stands for effective sample size. If your effective sample size is very low, then you do not have a good approximation of your posterior since you do not have many effective samples from it. You should think about changing some hyperparameters like population_size, check if your summary_statistics are informative enough and if the distance fits the kind of statistics (there are adaptive distances available, for example, if you have summary statistics of different magnitudes) or check the kernel for the transitions (ideally should look similar to your posterior, i.e., if you expect something which is highly not normal try a local transition kernel).

I hope this helps! Feel free to ask further questions if something is unclear!

stephanmg commented 11 months ago

Thanks @arrjon for handling this. If these details are not yet in the documentation, I guess it would be good to add it... either to a FAQ section or where it would fit naturally.

Gabriel-p commented 11 months ago

Thank you very much for your detailed answer @arrjon! Regarding the transitions, is there an example I can read to see how it can be implemented?

I might be coming back to ask you a few questions as I start using the package more, is that alright?

arrjon commented 11 months ago

Yes, sure! In #623 I posted a minimal example using a local transition kernel. You can find the implemented kernels here documentation.

Gabriel-p commented 11 months ago

Hi @arrjon I have a question if you don't mind.

With regular Bayesian inference and tools like emcee I can assess the convergence of the process using a few methods like plotting and examining chain mixing, Gelman-Rubin statistic, and similar tools. Is there an equivalent method with pyABC? Or some number I should pay attention to (other than the distance)?

stephanmg commented 11 months ago

Hi @Gabriel-p,

I think it would be better if you could please use the pyABC Github forum here: https://github.com/ICB-DCM/pyABC/discussions - that way we can compile knowledge for everybody (closed issues seem not to be too appropriate).

Kind regards, Stephan

ICB-DCM / pyABC

A few questions about pyABC #624