Open nuferates opened 1 year ago
Indeed, it would be extremely useful if there is some help that can be displayed. This is not just relevant for the diagnostic plot, but it may also be useful for other things.
Focusing on the diagnostic plot in this issue, we have some plots from various presentations that show what the plot looks like when there are no deviations from normality, when we have a heavily tailed distribution, and when we have left- or right skewness:
diagnostics-normal.pdf diagnostics-heavy-tails.pdf diagnostics-left-skewed.pdf diagnostics-right-skewed.pdf
Those plots are generated with the following script:
# --------------------------------------
# Author: Andreas Alfons
# Erasmus Universiteit Rotterdam
# --------------------------------------
# load packages
library("robmed")
library("scales")
library("sn")
## control parameters for simulation
seed <- 20210617 # seed for the random number generator
n <- 200 # number of observations
K <- 1000 # number of simulation runs
R <- 5000 # number of bootstrap samples
# coefficients in mediation model
a <- b <- c <- 0.4
# error scales in mediation model
sigma_m <- sqrt(1 - a^2)
sigma_y <- sqrt(1 - b^2 - c^2 - 2*a*b*c)
## parameter combinations and fancy labels for plots
alpha <- c(0, 0, -Inf, Inf)
nu <- c(Inf, 2, Inf, Inf)
labels <- c("normal", "heavy-tails", "left-skewed", "right-skewed")
## control parameters for methods
level <- 0.95 # confidence level
lmrob_control <- reg_control(max_iterations = 5000) # MM-estimator
# function to compute mean of skew-t distribution
get_mean <- function(ksi = 0, omega = 1, alpha = 0, nu = Inf) {
if (is.infinite(alpha)) delta <- sign(alpha)
else delta <- alpha / sqrt(1 + alpha^2)
if (is.infinite(nu)) b_nu <- sqrt(2 / pi)
else b_nu <- sqrt(nu) * beta((nu-1)/2, 0.5) / (sqrt(pi) * gamma(0.5))
# compute mean
ksi + omega * delta * b_nu
}
## set seed of the random number generator for reproducibility
set.seed(seed)
## generate independent variable and error terms such that results are
## comparable across different values of the parameters
x <- rnorm(n)
# First uniformly distributed values are drawn, which are later transformed
# to the given error distribution by applying the inverse CDF.
u_m <- runif(n)
u_y <- runif(n)
## loop over parameter settings
for (i in 1:length(labels)) {
# transform errors
if (is.infinite(nu[i])) {
# normal or skew-normal distribution
if (is.finite(alpha[i]) && alpha[i] == 0) {
# normal errors
e_m <- qnorm(u_m)
e_y <- qnorm(u_y)
} else {
# skew-normal errors
e_m <- qsn(u_m, alpha = alpha[i])
e_y <- qsn(u_y, alpha = alpha[i])
}
} else {
# t or skew-t distribution
if (is.finite(alpha[i]) && alpha[i] == 0) {
# t-distributed errors
e_m <- qt(u_m, df = nu[i])
e_y <- qt(u_y, df = nu[i])
} else {
# skew-t distributed errors
e_m <- qst(u_m, alpha = alpha[i], nu = nu[i])
e_y <- qst(u_y, alpha = alpha[i], nu = nu[i])
}
}
# compute mean of error terms such that it can be subtracted
mu_m <- get_mean(omega = sigma_m, alpha = alpha[i], nu = nu[i])
mu_y <- get_mean(omega = sigma_y, alpha = alpha[i], nu = nu[i])
# transform error terms (scale and center)
e_m_transformed <- sigma_m * e_m - mu_m
e_y_transformed <- sigma_y * e_y - mu_y
# generate hypothesized mediator and dependent variable
m <- a * x + e_m_transformed
y <- b * m + c * x + e_y_transformed
simulated_data <- data.frame(x, y, m)
# fit mediation model
robust_fit <- fit_mediation(simulated_data, "x", "y", "m",
control = lmrob_control)
# create plot
p <- weight_plot(robust_fit) +
scale_color_manual("", values = c("black", "#00BFC4"))
# save plot to file
pdf(file = sprintf("figures/diagnostics-%s.pdf", labels[i]),
width = 6, height = 4.25)
print(p)
dev.off()
}
I don't want to attempt anything regarding AI generated interpretations of a given plot. Users might put that into papers without checking, which causes ethical issues regarding authorship and may reflect negatively on our software. I'm in favor of humans creating their own interpretations.
It would be our job to help them by having a clear explanation. Another suggestion would be to add confidence bands to make it clear if deviations from the expected line are significant, see this issue for package robmed
: https://github.com/aalfons/robmed/issues/30
@AuroreAA, do you have suggestions on how to implement such help functionality?
I can think about different solutions:
The first outcome of the ROBMED results is the diagnostic plot. We already explained how to make sense of it in our ORM piece, but I think we should not assume that readers will be interested enough to explore how it is used. So can we add a question mark icon next to the diagnostic plot title and when the users hover their mouse over they get to see a generic explanation of how it is interpreted? It could even be nicer if we can generate a specific interpretation for each graph underneath (can AI help?), but it might be well beyond the scope of this project.