DoubleML / doubleml-for-r

DoubleML - Double Machine Learning in R
https://docs.doubleml.org
Other
132 stars 25 forks source link

Missing excpetion handling for infinite / missing predictions #136

Open MalteKurz opened 2 years ago

MalteKurz commented 2 years ago

There is no exception handling in-place in case some learner produces infinite or missing predictions. Basically, very silently the estimates are becoming NA's without a warning or exception.

See for example:

library(DoubleML)

g = function(x) {
  res = sin(x)^2
  return(res)
}

m = function(x, nu = 0, gamma = 1) {
  xx = sinh(gamma) / (cosh(gamma) - cos(x - nu))
  res = 0.5 / pi * xx
  return(res)
}

dgp1_irmiv = function(theta, N, k) {

  b = 1 / (1:k)
  sigma = clusterGeneration::genPositiveDefMat(k, "unifcorrmat")$Sigma

  X = mvtnorm::rmvnorm(N, sigma = sigma)
  G = g(as.vector(X %*% b))
  M = m(as.vector(X %*% b))

  pr_z = 1 / (1 + exp(-(1) * X[, 1] * b[5] + X[, 2] * b[2] + rnorm(N)))
  z = rbinom(N, 1, pr_z)

  U = rnorm(N)
  pr = 1 / (1 + exp(-(1) * (0.5 * z + X[, 1] * (-0.5) + X[, 2] * 0.25 - 0.5 * U + rnorm(N))))
  d = rbinom(N, 1, pr)
  err = rnorm(N)

  y = theta * d + G + 4 * U + err

  data = data.frame(y, d, z, X)

  return(data)
}

set.seed(1282)
df = dgp1_irmiv(0.5, 1000, 20)
Xnames = names(df)[names(df) %in% c("y", "d", "z") == FALSE]
dml_data = double_ml_data_from_data_frame(df,
                                          y_col = "y",
                                          d_cols = "d", x_cols = Xnames, z_col = "z")

ml_g = mlr3::lrn("regr.rpart", cp = 0.01, minsplit = 20)
ml_m = mlr3::lrn("classif.rpart", cp = 0.01, minsplit = 20)
ml_r = mlr3::lrn("classif.rpart", cp = 0.01, minsplit = 20)

set.seed(3141)
double_mliivm_obj = DoubleMLIIVM$new(
  data = dml_data,
  n_folds = 5,
  ml_g = ml_g,
  ml_m = ml_m,
  ml_r = ml_r,
  dml_procedure = "dml2",
  trimming_threshold = 0,
  score = "LATE")
double_mliivm_obj$fit()
print(double_mliivm_obj$coef)
print(double_mliivm_obj$se)

It is then getting even more confusing if one thereafter calls the method bootstrap(). This results in exception

double_mliivm_obj$bootstrap()
Error in double_mliivm_obj$bootstrap(): Apply fit() before bootstrap().

which is obviously not the root cause and also the remark to apply fit() will obviously not fix the issue.

I propose to implement a check for finite predictions similar to the check in the Python package: https://github.com/DoubleML/doubleml-for-py/blob/b3cbdb572fce435c18ec67ca323645900fc901b5/doubleml/_utils.py#L204-L208

MalteKurz commented 2 years ago

The actual root cause in the example above is not a not finite prediction but a propensity score estimate of 1.

MalteKurz commented 2 years ago

The actual root cause in the example above is not a not finite prediction but a propensity score estimate of 1.

Estimated probabilities / propensity scores may need special attention, i.e., a check that they are (strictly) in the interval (0,1). See also: https://github.com/DoubleML/doubleml-for-py/issues/129