Closed gleb-roma closed 2 years ago
Actually your proposal works more or less out of the box.
Currently there is no interface implemented to pass through additional parameters. However, in R
you can always access global variables from within the function. Of course this should be used with care, but I think in the described situation it should be fine to define the weights in the outer scope / globally and then access them within your score function. The only thing I would recommend is to use some more specific variable name than W
to prevent that it is overwritten without you noticing. Here is some general information about R environments, scopes and local vs. global variables: https://www.datamentor.io/r-programming/environment-scope/.
In the sample code below, you can see how you can define a score function with weighting (except for the variable naming, it is the same as yours). You can then pass it as score
to your DoubleMLPLR
object. To verify that everything works as expected, I redo the analysis without weighting and show that you get the same result when you do the weighting ex-post. Still I believe it is more convenient to work with a score function as it makes sure that the scores etc. are always equipped with the weighting. Note that I just used some random weighting in the sample code since I did not have a reasonable example at hand and just wanted to demonstrate the functionality.
library(DoubleML)
library(mlr3)
n_obs = 500
# define the weights as global variable (can be accessed from within the function)
weights_for_score = rep(1, n_obs) # equal weights, no effect
weights_for_score = runif(n_obs) # random weights
weights_for_score = weights_for_score/sum(weights_for_score)
weighted_score = function(y, d, g_hat, m_hat, smpls) {
resid_y = y - g_hat
resid_d = d - m_hat
psi_a = -1 * resid_d * resid_d * weights_for_score
psi_b = resid_d * resid_y * weights_for_score
psis = list(psi_a = psi_a, psi_b = psi_b)
return(psis)
}
ml_g = lrn("regr.ranger", num.trees = 10, max.depth = 2)
ml_m = ml_g$clone()
obj_dml_data = make_plr_CCDDHNR2018(n_obs = n_obs, alpha = 0.5)
set.seed(3141)
dml_plr_weighted = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m,
score = weighted_score)
dml_plr_weighted$fit(store_predictions = TRUE)
dml_plr_weighted$summary()
## check that it works as expected
# estimtate an object without weighting
set.seed(3141)
dml_plr = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m,
score = "partialling out")
dml_plr$fit(store_predictions = TRUE)
# extract the (not weighted) scores
psi_a = dml_plr$psi_a
psi_b = dml_plr$psi_b
# weight the scores and re-estimate the parameter
psi_a_weighted = psi_a * weights_for_score
psi_b_weighted = psi_b * weights_for_score
theta_hat_w_weights = - mean(psi_b_weighted) / mean(psi_a_weighted)
# we get exactly the same result as with the `score = weighted_score`
print(theta_hat_w_weights == dml_plr_weighted$coef)
This is great, thank you!
The pdf documentation (link) suggests that a user can specify the "score" parameter (at least for PLM estimator) in the call of DoubleMLPLR$new(), "for example, to adjust the DML estimators in terms of a re-weighting".
This is exactly my situation. How can I pass the weights?
The template for the score function in the example doesn't have a weight parameter... I want to do something like this (the only change I made to the code is added Ws):