LucyMcGowan / writing-positivity-continous-ps

0 stars 0 forks source link

diagnostics for positivity #4

Open malcolmbarrett opened 1 year ago

malcolmbarrett commented 1 year ago

are there any? can we suggest some?

malcolmbarrett commented 1 year ago

I wonder if weighted tables will help illuminate anything diagnostically

bblette1 commented 1 year ago

I think the mirrored histograms in the doc are a good start. Can you describe the weighted tables more?

Diagnostics would be useful as its own separate section

Looking at weight distributions should be helpful like in discrete case, with usual checks like spread and whether the mean weight is 1. I imagine as we move to the worse simulations scenarios, the weight distributions become more spread out, with a lot more extreme weights on both ends

bblette1 commented 1 year ago

Looks like in a=1,b=1 scenario, the weight distribution for n=10000 is Min. 1st Qu. Median Mean 3rd Qu. Max. 0.04408 0.91304 0.95217 0.99805 1.03700 15.18397

But for a=10,b=10, the weight distribution is Min. 1st Qu. Median Mean 3rd Qu. Max. 0.1156 0.1248 0.1493 0.4021 0.2277 300.2096

So of those 10000 people, 1 person has a weight of 300, while 7500 individuals have a combined contribution of probably ~1000

The 6 largest weights are 300.20959 208.61328 148.19778 139.65664 50.33558 43.63453

such that those 6 individuals have a similar weight to the other 7500... not to cross threads, but overlap weights should help in this case, no?

malcolmbarrett commented 1 year ago

Re: weighted tables, here's an example from our workshop: https://causal-inference-r-workshop.netlify.app/07-pscore-diagnostics.html#25

A variation for this problem (based on #5):

library(arrow)
library(dplyr)
library(survey)
library(gtsummary)

.df <- open_dataset("data/") |> 
  filter(id == 1, n == 100, a == 1, b == 1, p == "0.5") |> 
  collect() 

denominator_model <- lm(
  x ~ c,
  data = .df
)

ps <- dnorm(
  .df$x,
  fitted(denominator_model),
  sigma(denominator_model)
)

weight <- dnorm(.df$x, mean(.df$x), sd(.df$x)) / ps

ate_w <- lm(y ~ x, weights = weight, data = .df)

svy_des <- svydesign(
  ids = ~ 1,
  data = .df,
  weights = weight
)

tbl_svysummary(
  svy_des, 
  by = c,
  include = c(x, y, c)
) %>% 
  add_difference(everything() ~ "smd")
Characteristic 0, N = 521 1, N = 481 Difference2 95% CI2,3
x 0.57 (-0.24, 1.28) 0.72 (-0.51, 1.23) 0.02 -0.37, 0.42
y 0.19 (-0.33, 0.98) 0.96 (0.33, 1.55) -0.63 -1.0, -0.23
1 Median (IQR)
2 Standardized Mean Difference
3 CI = Confidence Interval

Created on 2022-11-28 with reprex v2.0.2

But I guess we should figure out a way to show weighted correlations with x or something instead

bblette1 commented 1 year ago

So compared to a usual weighted table, the confounder is now the columns and the exposure can go in the row? (Correct me if wrong). It looks useful to me.

Could we also flip the ECDF diagnostic on its head in a similar way for discrete confounders, and plot the ECDF of a continuous exposure across levels of a confounder, under weighted and unweighted conditions?

Weighted correlations also sounds promising, especially for the continuous exposure, continuous confounder cases.