Closed Generalized closed 1 year ago
My advice is to organise your data in the wide matrix. Because observations are then exchangeable, it will be a lot easier to impute the data. See http://stefvanbuuren.name/fimd/sec-longandwide.html and http://stefvanbuuren.name/fimd/sec-fdd.html. Hope this helps.
Thank you very much! Very helpful!
Dear @stefvanbuuren
I apologize for a probably trivial question, but I must have missed something trivial and cannot set up a simple example... thus asking for help.
The data: it's an example of a 2-arm 2-timepoints clinical trial. I want to compare treatment A vs. B, at timepoint V1 (visit 1) and V2.
It is a longitudinal study, so can be expressed as a multi-level model. It has the
In a real, multi-timepoint example, there would be much more timepoints, the responses would be correlated within cluster (patients) and might have different variance at each timepoint. But let's try this simplified example.
For simplicity let's assume there's just single response variable "val" (value). "valBas is the baseline, for which the model will be adjusted.
The data:
Now let's prepare the data and remove a few observations from the response:
Now I'm trying to set up the necessary parameters. I would like to use PMM method. I cannot guarantee (have no priori knowledge about) the normality of its distribution. PMM seems a safer option to the researchers I cooperate with.
I want to impute the missing variables while accounting for the cluster (patient) and the timepoint (tim).
I'm not sure, should I use this "pmm" or "2lonly.pmm"? Maybe I should use the 2l.pmm from the miceadds package?
Now, when I'm trying to impute the missing values, it says:
With ordinary "pmm" it works fine, but does this method account for the clustering?
Let's try the package 2l.pmm from miceadds.
It fails if using the time variable (too many parameters for both correlated random slopes and intercepts)
but works if I skip the time and only leave the random intercepts:
PS: here it's only one missing value "val". II may use more of them, say, "val1", "val2", "val3" (all numeric, of potentially different and not necessarily normal distributions) and maybe "val4" binary (yes/no), all with potential gaps.
Would be the setup accounting for ID (and optionally TIME) very different if using them? I saw very different setups of the predictor matrix... I want these additional variables to be included in imputed in a chain, one by one. Then they will be analysed as well.