DoubleML / doubleml-for-r

DoubleML - Double Machine Learning in R
https://docs.doubleml.org
Other
126 stars 25 forks source link

Score function with weighting #124

Closed gleb-roma closed 2 years ago

gleb-roma commented 2 years ago

The pdf documentation (link) suggests that a user can specify the "score" parameter (at least for PLM estimator) in the call of DoubleMLPLR$new(), "for example, to adjust the DML estimators in terms of a re-weighting".

This is exactly my situation. How can I pass the weights?

The template for the score function in the example doesn't have a weight parameter... I want to do something like this (the only change I made to the code is added Ws):

# Here:
# y: dependent variable
# d: treatment variable
# g_hat: predicted values from regression of Y on X's
# m_hat: predicted values from regression of D on X's
# smpls: sample split under consideration, can be ignored in this example

score_manual = function(y, d, g_hat, m_hat, smpls) {
  resid_y = y - g_hat
  resid_d = d - m_hat
  psi_a = -1 * resid_d * resid_d * W   # HERE
  psi_b = resid_d * resid_y * W      # and HERE
  psis = list(psi_a = psi_a, psi_b = psi_b)
  return(psis)
}
MalteKurz commented 2 years ago

Actually your proposal works more or less out of the box.

Currently there is no interface implemented to pass through additional parameters. However, in R you can always access global variables from within the function. Of course this should be used with care, but I think in the described situation it should be fine to define the weights in the outer scope / globally and then access them within your score function. The only thing I would recommend is to use some more specific variable name than W to prevent that it is overwritten without you noticing. Here is some general information about R environments, scopes and local vs. global variables: https://www.datamentor.io/r-programming/environment-scope/.

In the sample code below, you can see how you can define a score function with weighting (except for the variable naming, it is the same as yours). You can then pass it as score to your DoubleMLPLR object. To verify that everything works as expected, I redo the analysis without weighting and show that you get the same result when you do the weighting ex-post. Still I believe it is more convenient to work with a score function as it makes sure that the scores etc. are always equipped with the weighting. Note that I just used some random weighting in the sample code since I did not have a reasonable example at hand and just wanted to demonstrate the functionality.

library(DoubleML)
library(mlr3)

n_obs = 500

# define the weights as global variable (can be accessed from within the function)
weights_for_score = rep(1, n_obs) # equal weights, no effect
weights_for_score = runif(n_obs) # random weights
weights_for_score = weights_for_score/sum(weights_for_score)

weighted_score = function(y, d, g_hat, m_hat, smpls) {
  resid_y = y - g_hat
  resid_d = d - m_hat
  psi_a = -1 * resid_d * resid_d * weights_for_score
  psi_b = resid_d * resid_y * weights_for_score
  psis = list(psi_a = psi_a, psi_b = psi_b)
  return(psis)
}

ml_g = lrn("regr.ranger", num.trees = 10, max.depth = 2)
ml_m = ml_g$clone()
obj_dml_data = make_plr_CCDDHNR2018(n_obs = n_obs, alpha = 0.5)

set.seed(3141)
dml_plr_weighted = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m,
                                   score = weighted_score)
dml_plr_weighted$fit(store_predictions = TRUE)
dml_plr_weighted$summary()

## check that it works as expected
# estimtate an object without weighting
set.seed(3141)
dml_plr = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m,
                          score = "partialling out")
dml_plr$fit(store_predictions = TRUE)

# extract the (not weighted) scores
psi_a = dml_plr$psi_a
psi_b = dml_plr$psi_b

# weight the scores and re-estimate the parameter
psi_a_weighted = psi_a * weights_for_score
psi_b_weighted = psi_b * weights_for_score
theta_hat_w_weights = - mean(psi_b_weighted) / mean(psi_a_weighted)

# we get exactly the same result as with the `score = weighted_score`
print(theta_hat_w_weights == dml_plr_weighted$coef)
gleb-roma commented 2 years ago

This is great, thank you!