Design robust calibration tests for samplers

aemcmc currently does not implement calibration tests for its samplers (i.e. it does not check that the samplers generate samples from the correct distribution), while aehmc's tests are very flaky and can be made to pass or fail with different RNG seeds. While this is still a very open area of research, making sure that the samplers that aemcmc build generate correct samples is critical.

This blog post summarizes the situation fairly well and links to the relevant literature.

My current understanding from the literature is that existing solutions are hypothesis tests which test hypotheses that we don't expect to be true. There seems to be (unsurprisingly) little hope that we can devise tests that require absolutely no supervision. However, there are a few things we could do to automate some of Gelman's suggestions in the post above, and the general idea would be to have some rough tests in CI to avoid obvious miscalibration (check that the chain has moved on a simple problem, check the monte carlo error, etc. )and give the users some tools to evaluate the calibration of the sampler on their use case.

aesara-devs / aemcmc

Design robust calibration tests for samplers #30