Open stefvanbuuren opened 1 year ago
Ideas for further development:
blocks
by nest
(character vector with length ncol(data)
with block names. The default is colnames(data)
)sampler.univ()
. Add examples that exploit formulas
to add interactions, nested variables, by-processing and other advanced modelspredictorMatrix
and formulas
specificationRd
tags to roxygen2
tags.Commits 5c6bee2 and 755c23a generalise the classic behaviour of the predictorMatrix
to blocks.
It works as follows:
mice()
uses the nimp()
function to calculate the number of imputations needed for a given block of variables;j
is zero, the following happens:
1) mice()
sets method[j] <- ""
2) mice()
sets predictorMatrix[v, ] <- 0
for all variables v
in block j
This PR also removes the error message mice detected constant and/or collinear variables. No predictors were left after their removal. Imputations will be generated without predictors by the intercept-only imputation model (not recommended in general).
WARNING: Setting predictorMatrix[v, ] <- 0
does not prevent imputation of variable v
. To prevent imputation of v
, specify the appropriate entry of method
as ""
.
Commit c2da03c cleans up the internal function edit.setup()
. It return the proper formulas
of the reduced model, but it is not quite right for meth
, vis
and post
. Added FIXME.
Prevention of NA
propagation by removing incomplete predictors. This version detects when a predictor contains missing values that are not imputed. In order to prevent NA propagation, mice()
does the following actions: 1) removes incomplete predictor(s) from the RHS, 2) adds incomplete predictor(s) to formulas (var ~ 1)
and block components, sets method[var] = ""
, and sets the predictorMatrix
column and row to zero
The predictorMatrix
input can be a square submatrix of the full predictorMatrix
. mice()
will augment predictorMatrix
to the full matrix and always return a p * p named matrix corresponding to the p columns in the data. The inactive variables will have zero columns and rows.
The predictorMatrix
input may be unnamed if its size is p p. For other than p p, an unnamed matrix generated an error.
remove.rhs.variables()
validate.mids()
check at exit that errors if rownames(predictorMatrix)
differ from colnames(data)
. Some more output tests need to be added.predictorMatrix
predictorMatrix
has fewer rows than length of blocks
Exit checks added:
rownames(predictorMatrix)
must match colnames(data)
formulas
and blocks
must be equalformulas
and method
must be equalmethod
vector cannot exceed number of variablesimp
and number of variables must be equalTWO SEPARATE INTERFACES FOR MODEL SPECIFICATION: This version promotes two interfaces to specify imputations models: predictor (predictorMatrix
+ parcel
+ method
) and formula (formulas + method
). This version does not accept anymore accept mixes of predictorMatrix
and formulas
arguments in the call to mice()
.
NA-PROPAGATION PREVENTION. This version detects when a predictor contains missing values that are not imputed. In order to prevent NA propagation, mice()
can follow two strategies: "Autoremove" (remove incomplete predictor(s) from the RHS, set method
to ""
, adapt predictorMatrix
, formulas
and blocks
, write to loggedEvents), or "Autoimpute" (Impute incomplete predictor and adapt method
, predictorMatrix
, formulas
, and so on). "Autoremove" is implemented and current default. Use mice(..., autoremove = FALSE)
to revert to old behavior (NA propagation).
SUBMODELS: The predictorMatrix
input can be a square submatrix of the full predictorMatrix
when its dimensions are named. mice()
will augment the tiny predictorMatrix
to the full matrix and always return a p * p named matrix corresponding to the p columns in the data. Unmentioned variables are not imputed, and the predictorMatrix
, formulas
and method
are adapted accordingly.
DROP NON-SQUARE PREDICTOR MATRIX: Version 3.0 introduced non-square versions, but its interpretation turned out to be complex and ambiguous. For clarity, this update works with a predictor matrix that is square with both dimensions identically named with the names of the variables in the data. Variable groups are now specified through the parcel
argument.
NEW PARCEL ARGUMENT. There is a new parcel
argument that is easier to use. The print of the mids
object shows parcel
when it is different from the default. parcel
can take over the role of blocks
in specification. blocks
is soft-deprecated, but still widely used within the program code.
NEW DOTS ARGUMENT. The blots
argument is renamed to dots
EXIT VALIDATION: Adds a new validate.mids()
checks the mids
object before exit.
Three proposed changes to new behaviour
NA-PROPAGATION. It is better to use NA-PROPAGATION by default. The reason is that the user becomes aware of a potential model specification problem (e.g. not imputing a variable used as a predictor). mice()
should offer two easy ways to solve the problem: "autoremove" and "autoimpute". We prefer the NA-PROPAGATION default because it alerts the user, whereas the other two options would "magically" make the problem disappear (and thereby downgrade model specification hygiene).
The formula
of a complete variable is now something like age ~ 1
. It is better to use age ~ 0
, to signal that for the dependent not even the intercept-only model is used.
The formulas
argument return as environment
attached to the each formula
. This environment does not seem to necessary in mice()
, so it is cleaner to remove environment
.
predictorMatrix
p2f()
,p2c()
,f2p()
,n2b()
,b2n()
validate.blocks()
,validate.predictorMatrix()
edit.setup()
toformulas
andblots
~ 1
for the empty predictor set instead of~ 0
method = ""
for variables that are not imputed (NOTE: DECISION REVERTED. SEE BELOW)formulas
(instead ofblocks
orpredictorMatrix
)typecodes()
insampler()
to reduce multiplepredictorMatrix
lines to one (support for multivariate imputation methods)samper.univ()
predictorMatrix
andformulas
specifications