Open mjskay opened 5 years ago
This is from the Steegen et al. paper (pg. 710). As part of one of the vignettes we can average the p-values for different parameters and maybe add a confidence interval to the average.
The multiverse analysis does not produce a single value summarizing the eviden- tial value of the data, nor does it imply a threshold for an effect to reach to be declared robustly significant. Never- theless, one might try to summarize the multiverse analy- sis more formally. One reasonable first step is to simply average the p values in the multiverse, in this case aver- aging all the numbers displayed in Figure 1 or 2. This mean value can be considered as the p value of a hypo- thetical preregistered study with conditions chosen at random among the possibilities in the multiverse and seems like a fair measurement in a setting where all of the possible data processing choices seem plausible (as in the example presented here, where the different options are drawn from other papers in the relevant literature).
The quote about hierarchical models is (same page as above):
In a more complete analysis, the multiverse of data sets could be crossed with the multiverse of models to further reveal the multiverse of statistical results ... this motivates encompassing analyses of multiple predictors, interactions, or outcomes in a hierarchical model so as to reduce problems of mul- tiple comparisons (Gelman, Hill, & Yajima, 2012).
Constructing this type of vignette could be interesting.
I'm adding p-boxes to this list: bounds on a CDF. Perhaps the "envelope" method could be used to summarize posterior distributions from Bayesian models, for example (see this paper: https://hal.inria.fr/hal-01518666 ).
And another computational approach to multiverse summaries I'm adding to the list: https://journals.sagepub.com/doi/10.1177/0049124115610347
(NB I haven't read this yet)
Also if we're thinking about model comparison need to be careful that we use valid metrics. My understanding is you'd have to limit comparison to models fit on the same data (so no multiverses with outlier removal), then you'd want a metric that allows valid comparison if variables have been transformed (like scaled CRPS: https://t.co/JfOFMVi8Pz)
Some options:
To be resolved: how does this square philosophically with the whole idea of a multiverse in the first place? Need to be very careful about this. If a multiverse is about acknowledging ontological uncertainty (and about having conversations about it through the literature), how does reducing it back down to a single estimate (or p value) square with that? @dragice thoughts?