This package includes testing of stochastic code. Primarily MCMC for Bayesian inference via Stan. It's challenging to include integration tests for such code. For example if the code produces some $p$-value, what is the right test threshold? And what can we do about the fact that the code, if working correctly, should always fail the test some proportion of the time?
The goal here is to mitigate this problem by at a minimum making the test suite reproducible by setting seeds. I don't know that this fully solves the problem but perhaps it's a step in the right direction. As an example of it not fully solving the problem, say we substitute in a new sampler or feature which changes how RNG is used. Then perhaps our tests fail, but the feature could not be a problem, it just got unlucky on the stochastic tests.
Required features
Add the required seeds to any stochastic tests. This may include seeds in the .R file, in the call to test_that, in calls to brms...
Out of scope
Stretch goal here would be to have a more sophisticated / thought through approach to testing of stochastic code.
Related documents
I would assume other packages do this. I could look for examples and add here.
We can set temporary seeds inside test_that vs script wide which is probably the better bet. We can also directly give cmdstanr a seed if we wish (otherwise I think it inherits the R seed)
Goal
This package includes testing of stochastic code. Primarily MCMC for Bayesian inference via Stan. It's challenging to include integration tests for such code. For example if the code produces some $p$-value, what is the right test threshold? And what can we do about the fact that the code, if working correctly, should always fail the test some proportion of the time?
The goal here is to mitigate this problem by at a minimum making the test suite reproducible by setting seeds. I don't know that this fully solves the problem but perhaps it's a step in the right direction. As an example of it not fully solving the problem, say we substitute in a new sampler or feature which changes how RNG is used. Then perhaps our tests fail, but the feature could not be a problem, it just got unlucky on the stochastic tests.
Required features
.R
file, in the call totest_that
, in calls tobrms
...Out of scope
Related documents