DoubleML / doubleml-for-r

DoubleML - Double Machine Learning in R
https://docs.doubleml.org
Other
126 stars 25 forks source link

Pass score function for IRM #126

Closed gleb-roma closed 2 years ago

gleb-roma commented 2 years ago

Follow-up to this https://github.com/DoubleML/doubleml-for-r/issues/124#issue-1011485836 but for the IRM model.

I want to modify the score function for IRM to allow weights. In the example provided in the manual, I only see how to pass g_hat and m_hat for the PLM. However, IRM requires passing g0_hat and g1_hat. How do I do it?

PLM from the manual:

# Here:
# y: dependent variable
# d: treatment variable
# g_hat: predicted values from regression of Y on X's
# m_hat: predicted values from regression of D on X's
# smpls: sample split under consideration, can be ignored in this example

score_manual = function(y, d, g_hat, m_hat, smpls) {
  resid_y = y - g_hat
  resid_d = d - m_hat
  psi_a = -1 * resid_d * resid_d 
  psi_b = resid_d * resid_y 
  psis = list(psi_a = psi_a, psi_b = psi_b)
  return(psis)
}
MalteKurz commented 2 years ago

The signature of the function is stated in the API documentation for score (https://docs.doubleml.org/r/stable/reference/DoubleMLIRM.html#arguments), it is function(y, d, g0_hat, g1_hat, m_hat, smpls). Besides that you should define a score which is Neyman orthogonal. So depending on the effect to be estimated (ATE or ATTE) the score components have to be chosen accordingly as described in the user guide https://docs.doubleml.org/stable/guide/scores.html#interactive-regression-model-irm. The corresponding source code can be found here: https://github.com/DoubleML/doubleml-for-r/blob/a10e38d5e70eacf85742976c5e92ca03abcf41c3/R/double_ml_irm.R#L268-L301

gleb-roma commented 2 years ago

That's great, thank you! I didn't see that documentation, was looking only at the pdf.

Btw, is there a reason why you need p_hat in the score function for ATTE in IRM? It seems that it is just a scaling constant that doesn't affect anything, e.g. psi is orthogonal with and without p_hat.

gleb-roma commented 2 years ago

Okay, now the actual issue. The fitting works with the user-specified score function. However, if I tune, it fails with a strange error:

Error in self$score == "ATE" : 
  comparison (1) is possible only for atomic and list types

The same code works if instead of my score function, I specify "ATTE". Also, if I don't tune but only do fit(), it also works just fine, and moreover, estimates the parameter correctly on simple synthetic data. If I tune with a specified score function, it fails.

Script:

library(DoubleML)
library(mlr3)
library(paradox)
library(mlr3tuning)
library(tidyverse)

load(url("https://github.com/gleb-roma/FE/raw/main/mult.impact.df.het.RData"))

# set logger to omit messages during tuning and fitting
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn") 

dml.data.obj <-  double_ml_data_from_data_frame(df = mult.impact.df.het %>% as.data.frame(),
                                                y_col = "y",
                                                d_cols = "Tr",
                                                x_cols = c("x1","x2")
)

# Initialize learners, grid parameters for tuning and the fit measures
par_grids <- list()
tune_msr <- list()

learner_g <- lrn("regr.ranger", num.threads = 60
                 , num.trees = 500
                 , min.node.size = 10
                 # , max.depth = 4
)
par_grids$ml_g <- paradox::ps( 
  max.depth=paradox::p_int(1,10)
)
tune_msr$ml_g <- msr("regr.rsq")

learner_m = lrn("classif.ranger", num.threads = 60
                , num.trees = 500,
                min.node.size = 2 
)
par_grids$ml_m <- paradox::ps(max.depth=paradox::p_int(1,10)) 
tune_msr$ml_m <- msr("classif.acc") # ce https://mlr3.mlr-org.com/reference/mlr_measures_classif.ce.html 

score_atte_mult <- function(y, d, g0_hat, g1_hat, m_hat, smpls) { # this is an orthogonal score function to estimate a proportional treatment effect
  u0_hat = y - g0_hat
  term.x <- m_hat * (1 - d) * u0_hat / (1 - m_hat)

  psi_b = d * u0_hat  -  term.x
  psi_a = -g0_hat*d - term.x  

  psis = list(psi_a = psi_a, psi_b = psi_b) 
  return(psis)
}

set.seed(31415)
dml.mod <- DoubleMLIRM$new(dml.data.obj,
                           ml_m = learner_m,
                           ml_g = learner_g,
                           score = score_atte_mult, # WORKS WITH "ATTE", FAILS WITH MY FUNCTION
                           n_folds = 5,
                           n_rep = 1) 

# Provide tune settings
tune_settings = list(terminator = trm("evals", n_evals = 100),
                     algorithm = tnr("grid_search", resolution = 11),
                     rsmp_tune = rsmp("cv", folds = 5),
                     measure = tune_msr
)
cat("Tuning...")
dml.mod$tune(param_set = par_grids, tune_settings = tune_settings); print("Done.") # FAILS HERE

# best hyper-parameters
# res$best.params <- dml.mod$params

# Estimation 
cat("Estimating the causal parameter...")
dml.mod$fit() # WORKS ALWAYS
dml.mod$summary()
MalteKurz commented 2 years ago

That's great, thank you! I didn't see that documentation, was looking only at the pdf.

Btw, is there a reason why you need p_hat in the score function for ATTE in IRM? It seems that it is just a scaling constant that doesn't affect anything, e.g. psi is orthogonal with and without p_hat.

The option score='ATTE' implements the estimation of the average treatment effects on the treated (ATTE) as stated by Chernozhukov et al. (2018) in equation (5.4). See Section 5.1 in the paper for regularity conditions, orthogonality, etc.

MalteKurz commented 2 years ago

Okay, now the actual issue. The fitting works with the user-specified score function. However, if I tune, it fails with a strange error:

Error in self$score == "ATE" : 
  comparison (1) is possible only for atomic and list types

The same code works if instead of my score function, I specify "ATTE". Also, if I don't tune but only do fit(), it also works just fine, and moreover, estimates the parameter correctly on simple synthetic data. If I tune with a specified score function, it fails.

Thanks for reporting this bug. There was a bug in an if-condition in the tuning code for the IRM model. I fixed it in #127. You can get the bug fix by installing the dev version from GitHub.

MalteKurz commented 2 years ago

That's great, thank you! I didn't see that documentation, was looking only at the pdf. Btw, is there a reason why you need p_hat in the score function for ATTE in IRM? It seems that it is just a scaling constant that doesn't affect anything, e.g. psi is orthogonal with and without p_hat.

The option score='ATTE' implements the estimation of the average treatment effects on the treated (ATTE) as stated by Chernozhukov et al. (2018) in equation (5.4). See Section 5.1 in the paper for regularity conditions, orthogonality, etc.

@gleb-roma I once more checked Section 5.1 in Chernozhukov et al. (2018). There it is pointed out that "Note also that because p is a constant, it does not affect the DML estimators theta_0 based on the score psi in (5.4), but having p simplifies the formula for the variance of theta_0.". So basically the parameter and its variances are not affected by the additional multiplicative constant 1/p but the formula for the variance is simplified a bit. For a detailed explanation see also the attached note atte_score.pdf.

gleb-roma commented 2 years ago

That's great, thank you! I didn't see that documentation, was looking only at the pdf. Btw, is there a reason why you need p_hat in the score function for ATTE in IRM? It seems that it is just a scaling constant that doesn't affect anything, e.g. psi is orthogonal with and without p_hat.

The option score='ATTE' implements the estimation of the average treatment effects on the treated (ATTE) as stated by Chernozhukov et al. (2018) in equation (5.4). See Section 5.1 in the paper for regularity conditions, orthogonality, etc.

@gleb-roma I once more checked Section 5.1 in Chernozhukov et al. (2018). There it is pointed out that "Note also that because p is a constant, it does not affect the DML estimators theta_0 based on the score psi in (5.4), but having p simplifies the formula for the variance of theta_0.". So basically the parameter and its variances are not affected by the additional multiplicative constant 1/p but the formula for the variance is simplified a bit. For a detailed explanation see also the attached note atte_score.pdf.

Great, thank you for confirming and for the most thorough answer!