simanalyse is an R package to facilitate model comparisons and simulation studies.
To install the latest development version from GitHub
#install.packages("remotes")
remotes::install_github("audrey-b/simanalyse")
Simulate 100 datasets using the sims package.
library(simanalyse)
set.seed(10L)
params <- list(sigma = 2)
constants <- list(mu = 0)
code <- "for(i in 1:10){
y[i] ~ dnorm(mu, 1/sigma^2)}"
sims <- sims::sims_simulate(code,
parameters = params,
constants = constants,
nsims = 100,
silent = TRUE)
For example, the first dataset is
sims[1]
#> $y
#> [1] 1.50009615 -1.04870803 -0.09224249 -1.55116546 -1.26316050 2.48549233
#> [7] 0.42241980 2.70828118 0.21842600 -0.78109651
#>
#> $mu
#> [1] 0
#>
#> an nlists object of an nlist object with 2 numeric elements
Conduct a Bayesian analysis of each dataset. Let’s analyze each dataset in “report” mode. By default, this mode runs 3 chains until the following two criteria are met: convergence based on r.hat <1.1 and a minimum effective sample size of 400. The first half of the chains is discarded as burnin. Chains are thinned to 4000 iterations each to preserve disk and memory usage. See ?sma_set_mode for other choices of modes and customization.
prior <- "sigma ~ dunif(0, 6)"
results <- sma_analyse(sims = sims,
code = code,
code.add = prior,
mode = sma_set_mode("report"))
For example, the posterior distribution obtained from the analysis of the first dataset is pictured below.
plot(results[[1]])
Derive posterior samples for new parameters.
results.derived <- sma_derive(results, "var=sigma^2", monitor="var")
For example, the derived posterior distribution obtained for the first dataset is pictured below.
plot(results.derived[[1]])
The same transformation must be applied to the true parameter values for eventually evaluating the performance (e.g. bias) of the method for those new parameters,
params.derived <- sma_derive(params, "var=sigma^2", monitor="var")
print(params.derived)
#> $var
#> [1] 4
#>
#> an nlist object with 1 numeric element
Evaluate the performance of the model using the 100 analyses:
sma_evaluate(results.derived, parameters=params.derived)
#> term bias cpQuantile mse
#> 1 var 1.76198 0.92 9.410909
Several more performance measures are available and can be specified using the measures argument (see ?sma_evaluate for details). You may also create custom performance measures. The example below shows how to reproduce the results above with custom code.
sma_evaluate(results.derived,
measures = "",
parameters = params.derived,
custom_funs = list(estimator = mean,
cp.low = function(x) quantile(x, 0.025),
cp.upp = function(x) quantile(x, 0.975)),
custom_expr_b = "bias = estimator - parameters
mse = (estimator - parameters)^2
cpQuantile = ifelse((parameters >= cp.low) & (parameters <= cp.upp), 1, 0)")
#> term bias cpQuantile mse
#> 1 var 1.76198 0.92 9.410909
When running lengthy simulation studies, it is often preferable to save all the results to disk. You may do this by using the functions ending in files. By default, results are saved in your working directory unless the path argument is specified.
set.seed(10L)
sims::sims_simulate(code,
parameters = params,
constants = constants,
nsims = 100,
save = TRUE,
exists = NA)
sma_analyse_files(code = code,
code.add = prior,
mode = sma_set_mode("report"))
sma_derive_files(code="var=sigma^2", monitor="var")
sma_evaluate_files()
You may read specific files using the function sma_read_files. For example,
sma_read_files(sma_evaluate_files)
#> term bias cpQuantile mse
#> 1 var 1.76198 0.92 9.410909
Note that this result is identical to the result we obtained without saving to files, using the same seed. The result is also identical when using parallelization, described below.
Parallelization is achieved using the future package.
To use all available cores on a local machine simply execute the following code before calling the package’s functions.
library(future)
plan(multisession)
Alternatively, if you wish to use a cluster computing environment (e.g. Compute Canada) with a workload manager (e.g. Slurm, Torque/PBS), simply execute the following code before calling the package’s functions.
library(future)
library(future.batchtools)
plan(batchtools_slurm, # for Slurm; see help(batchtools_template) for other options
workers = 100, # maximum number of cores to be used at any time
template = "batchtools.slurm.tmpl") # path to configuration file
For example, the following configuration file may be used with the Slurm manager:
#!/bin/sh
#SBATCH --time=00:5:00 ## walltime in hh:mm:ss (per dataset)
#SBATCH --ntasks=1 ## number of cores (per dataset)
#SBATCH --mem-per-cpu=1000 ## min memory per core in MB
## Export value of DEBUGME environment var to slave
export DEBUGME=<%= Sys.getenv("DEBUGME") %>
module load nixpkgs/16.09 gcc/7.3.0 r/4.0.2 ## load R
module load jags/4.3.0 ## load JAGS
Rscript -e 'batchtools::doJobCollection("<%= uri %>")'
The configuration file will be called for each dataset. Thus in this example, one new core (ntasks=1) will be requested for each dataset and the 100 datasets will be analyzed in parallel on a total of 100 cores (workers=100).
Please report any issues.
Pull requests are always welcome.
Please note that the simanalyse project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.