Open malcolmbarrett opened 1 year ago
I wonder if weighted tables will help illuminate anything diagnostically
I think the mirrored histograms in the doc are a good start. Can you describe the weighted tables more?
Diagnostics would be useful as its own separate section
Looking at weight distributions should be helpful like in discrete case, with usual checks like spread and whether the mean weight is 1. I imagine as we move to the worse simulations scenarios, the weight distributions become more spread out, with a lot more extreme weights on both ends
Looks like in a=1,b=1 scenario, the weight distribution for n=10000 is Min. 1st Qu. Median Mean 3rd Qu. Max. 0.04408 0.91304 0.95217 0.99805 1.03700 15.18397
But for a=10,b=10, the weight distribution is Min. 1st Qu. Median Mean 3rd Qu. Max. 0.1156 0.1248 0.1493 0.4021 0.2277 300.2096
So of those 10000 people, 1 person has a weight of 300, while 7500 individuals have a combined contribution of probably ~1000
The 6 largest weights are 300.20959 208.61328 148.19778 139.65664 50.33558 43.63453
such that those 6 individuals have a similar weight to the other 7500... not to cross threads, but overlap weights should help in this case, no?
Re: weighted tables, here's an example from our workshop: https://causal-inference-r-workshop.netlify.app/07-pscore-diagnostics.html#25
A variation for this problem (based on #5):
library(arrow)
library(dplyr)
library(survey)
library(gtsummary)
.df <- open_dataset("data/") |>
filter(id == 1, n == 100, a == 1, b == 1, p == "0.5") |>
collect()
denominator_model <- lm(
x ~ c,
data = .df
)
ps <- dnorm(
.df$x,
fitted(denominator_model),
sigma(denominator_model)
)
weight <- dnorm(.df$x, mean(.df$x), sd(.df$x)) / ps
ate_w <- lm(y ~ x, weights = weight, data = .df)
svy_des <- svydesign(
ids = ~ 1,
data = .df,
weights = weight
)
tbl_svysummary(
svy_des,
by = c,
include = c(x, y, c)
) %>%
add_difference(everything() ~ "smd")
Characteristic | 0, N = 521 | 1, N = 481 | Difference2 | 95% CI2,3 |
---|---|---|---|---|
x | 0.57 (-0.24, 1.28) | 0.72 (-0.51, 1.23) | 0.02 | -0.37, 0.42 |
y | 0.19 (-0.33, 0.98) | 0.96 (0.33, 1.55) | -0.63 | -1.0, -0.23 |
1 Median (IQR) | ||||
2 Standardized Mean Difference | ||||
3 CI = Confidence Interval |
Created on 2022-11-28 with reprex v2.0.2
But I guess we should figure out a way to show weighted correlations with x
or something instead
So compared to a usual weighted table, the confounder is now the columns and the exposure can go in the row? (Correct me if wrong). It looks useful to me.
Could we also flip the ECDF diagnostic on its head in a similar way for discrete confounders, and plot the ECDF of a continuous exposure across levels of a confounder, under weighted and unweighted conditions?
Weighted correlations also sounds promising, especially for the continuous exposure, continuous confounder cases.
are there any? can we suggest some?