DoubleML / doubleml-for-r

DoubleML - Double Machine Learning in R
132 stars 25 forks source link

Missing excpetion handling for infinite / missing predictions #136

Open MalteKurz opened 2 years ago

MalteKurz commented 2 years ago

There is no exception handling in-place in case some learner produces infinite or missing predictions. Basically, very silently the estimates are becoming NA's without a warning or exception.

See for example:


g = function(x) {
  res = sin(x)^2

m = function(x, nu = 0, gamma = 1) {
  xx = sinh(gamma) / (cosh(gamma) - cos(x - nu))
  res = 0.5 / pi * xx

dgp1_irmiv = function(theta, N, k) {

  b = 1 / (1:k)
  sigma = clusterGeneration::genPositiveDefMat(k, "unifcorrmat")$Sigma

  X = mvtnorm::rmvnorm(N, sigma = sigma)
  G = g(as.vector(X %*% b))
  M = m(as.vector(X %*% b))

  pr_z = 1 / (1 + exp(-(1) * X[, 1] * b[5] + X[, 2] * b[2] + rnorm(N)))
  z = rbinom(N, 1, pr_z)

  U = rnorm(N)
  pr = 1 / (1 + exp(-(1) * (0.5 * z + X[, 1] * (-0.5) + X[, 2] * 0.25 - 0.5 * U + rnorm(N))))
  d = rbinom(N, 1, pr)
  err = rnorm(N)

  y = theta * d + G + 4 * U + err

  data = data.frame(y, d, z, X)


df = dgp1_irmiv(0.5, 1000, 20)
Xnames = names(df)[names(df) %in% c("y", "d", "z") == FALSE]
dml_data = double_ml_data_from_data_frame(df,
                                          y_col = "y",
                                          d_cols = "d", x_cols = Xnames, z_col = "z")

ml_g = mlr3::lrn("regr.rpart", cp = 0.01, minsplit = 20)
ml_m = mlr3::lrn("classif.rpart", cp = 0.01, minsplit = 20)
ml_r = mlr3::lrn("classif.rpart", cp = 0.01, minsplit = 20)

double_mliivm_obj = DoubleMLIIVM$new(
  data = dml_data,
  n_folds = 5,
  ml_g = ml_g,
  ml_m = ml_m,
  ml_r = ml_r,
  dml_procedure = "dml2",
  trimming_threshold = 0,
  score = "LATE")

It is then getting even more confusing if one thereafter calls the method bootstrap(). This results in exception

Error in double_mliivm_obj$bootstrap(): Apply fit() before bootstrap().

which is obviously not the root cause and also the remark to apply fit() will obviously not fix the issue.

I propose to implement a check for finite predictions similar to the check in the Python package:

MalteKurz commented 2 years ago

The actual root cause in the example above is not a not finite prediction but a propensity score estimate of 1.

MalteKurz commented 2 years ago

The actual root cause in the example above is not a not finite prediction but a propensity score estimate of 1.

Estimated probabilities / propensity scores may need special attention, i.e., a check that they are (strictly) in the interval (0,1). See also: