Closed thomvolker closed 3 years ago
@stefvanbuuren Can we prioritise this PR?
Thanks for the PR.
There's a lot of duplicated code. I will look into the possibility to integrate this functionality as an extra argument to the regular pool()
function.
I completely agree that this PR is mostly duplicate code. The reason to still write an additional function was to protect uninformed users against using wrong pooling rules. Still, an additional argument is probably more elegant.
mice 3.13.15
adds a new rule
argument to pool()
and pool.scalar()
and redefines pool.syn()
and pool.scalar.syn()
as wrappers. This removes almost all duplication and is extendable as other pooling rule come along.
Use pool.syn()
and pool.scalar.syn()
in code for synthetic data, and reserve pool()
and pool.scalar()
for missing data uses.
Nice indeed to separate the workflow between pool()
and pool.syn()
As discussed with @gerkovink, the
pool.syn()
andpool.scalar.syn()
pooling functions apply the rules developed by Reiter (2003) to combine analyses on multiply imputed synthetic datasets. Note that these rules only apply to synthetic versions of completely observed datasets. If the data to synthesize contains missing values, different pooling rules apply that require a two-step approach to imputation (first impute missingness, than synthesize all m imputed datasets). Developing a one-step approach would be something for future research.