American-Institutes-for-Research / WeMix

WeMix public repository
GNU General Public License v2.0
10 stars 1 forks source link

Propensity score weighting using WeMix #11

Closed yohei-h closed 1 year ago

yohei-h commented 1 year ago

Hi all,

I would like to know if WeMix::mix() is used correctly for the propensity score weighing analysis below (weight = wt, random effect = subject). Surprisingly, the results of lme4::lmer() and WeMix::mix() are different to some extent. The vignette says "When all weights above the individual level are 1, this is similar to a lmer and you should use lme4 because it is much faster.", so I'm worried my R codes are wrong...

library(lme4)
library(lmerTest)
library(tidyverse)
library(WeMix)
library(HSAUR3)
library(WeightIt)
library(cobalt)

data("BtheB", package = "HSAUR3")
df <- BtheB %>%
    as_tibble() %>%
    mutate(subject = seq_along(drug)) %>%
    gather(key = bdi, value = value, - drug, -length, -treatment, -bdi.pre, -subject) %>%
    mutate(time = rep(c(2, 3, 5, 8), each = 100)) %>%
    select(subject, time, bdi.pre, value, drug, length, treatment) 

#================================================================================
# Propensity score weighting
#================================================================================
w.out <- weightit(treatment ~ drug + length + bdi.pre, 
                  data = df,
                  method = 'ps',
                  estimand = 'ATO')  # overlap weight
df <- df %>% 
    mutate(wt = w.out$weights)

# lmer
model_lmer <- lmer(value ~ treatment + drug + length + bdi.pre + (1|subject),
                   weights = wt,
                   data = df)
parameters::model_parameters(model_lmer)  

# WeMix
model_WeMix <- mix(value ~ treatment + drug + length + bdi.pre + (1|subject),
                   data = df %>% 
                       mutate(wt2 = 1),
                   weights = c('wt', 'wt2')
                   )
s <- summary(model_WeMix)

# I'm not really confident about the 95%CI below
tibble(Estimate = s$coef[, 'Estimate'][2],
       CI_low = s$coef[, 'Estimate'][2] + 1.96*s$coef[, 'Std. Error'][2],
       CI_high = s$coef[, 'Estimate'][2] - 1.96*s$coef[, 'Std. Error'][2])

Thank you in advance, Yohei

pdbailey0 commented 1 year ago

lme4 violates the basic principle that if you set the weight to 2 or make two copies of a row you should get the same result. When I wrote that I trusted lme4 more. I now trust WeMix more because of this property. I hope that helps.

yohei-h commented 1 year ago

Thank you so much, Paul. I tried the weight of 1 and 2 in WeMix:: mix(). Do you mean the two results (model_WeMix1 and model_WeMix2) should be consistent? It looks like they are different.

library(lme4)
library(lmerTest)
library(tidyverse)
library(WeMix)
library(HSAUR3)
library(WeightIt)
library(cobalt)

data("BtheB", package = "HSAUR3")
df <- BtheB %>%
    as_tibble() %>%
    mutate(subject = seq_along(drug)) %>%
    gather(key = bdi, value = value, - drug, -length, -treatment, -bdi.pre, -subject) %>%
    mutate(time = rep(c(2, 3, 5, 8), each = 100)) %>%
    select(subject, time, bdi.pre, value, drug, length, treatment) 

#================================================================================
# Propensity score weighting
#================================================================================
w.out <- weightit(treatment ~ drug + length + bdi.pre, 
                  data = df,
                  method = 'ps',
                  estimand = 'ATO')
df <- df %>% 
    mutate(wt = w.out$weights)

# lmer
model_lmer <- lmer(value ~ treatment + drug + length + bdi.pre + (1|subject),
                   weights = wt,
                   data = df)
summary(model_lmer)

# WeMix (wt2 =1)
model_WeMix1 <- mix(value ~ treatment + drug + length + bdi.pre + (1|subject),
                   data = df %>% 
                       mutate(wt2 = 1),    
                   weights = c('wt', 'wt2')
                   )
summary(model_WeMix1)

# WeMix (wt2 =2)
model_WeMix2 <- mix(value ~ treatment + drug + length + bdi.pre + (1|subject),
                   data = df %>% 
                       mutate(wt2 = 2),    
                   weights = c('wt', 'wt2')
                   )
summary(model_WeMix2)
pdbailey0 commented 1 year ago

We have several tests that confirm the property I'm taking about in the test folder.

Here is an example. note that you need to make the ss1 data set with code above that test.

pdbailey0 commented 1 year ago

I would also note that to get lmer to agree with WeMix you need to set REML to FALSE.

nevertheless,

model_lmer <- lmer(value ~ treatment + drug + length + bdi.pre + (1|subject),
                   weights = wt, data = df, REML=FALSE)
summary(model_lmer)
df$wt2 <- 1
model_WeMix1 <- mix(value ~ treatment + drug + length + bdi.pre + (1|subject),
                   data = df,weights = c('wt', 'wt2'))
summary(model_WeMix1)

there is still a discrepancy.

What I can tell you is that WeMix treats the weights in the fashion I articulated above: it is the same to include two repetitions of an observation or to set the weight of a single repetitions to two. And this property should hold across all definitions of "weighting." See this very helpful Thomas Lumley post Ben Bolker shared with me when I asked about lmer weights.

The same property holds for groups. I can't tell you what properties lmer weights have, I didn't write that package. I asked the maintainers and didn't receive a satisfying answer. If you understand them better, please share that understanding.

yohei-h commented 1 year ago

Hi Paul,

Thank you for introducing the insightful link. So I am going to use WeMix rather than lmer moving forwards. I'm sorry I haven't got more knowledge about weight than you.