NSAPH-Software / CausalGPS

Matching on generalized propensity scores with continuous exposures
https://NSAPH-Software.github.io/CausalGPS/
24 stars 5 forks source link

Errors in GPS matching when large number of observations or small amount of trimming #145

Closed m-qin closed 2 years ago

m-qin commented 2 years ago

I'm running generate_pseudo_pop() on a random subset of my full dataset and get the following two errors when (a) I use a large subset of the data or (b) when I use less trimming in the "trim_quantiles" parameter. Would appreciate any advice!

Error in xgboost::xgb.DMatrix(data = X, label = Y, weight = obsWeights) : [16:20:52] amalgamation/../src/data/data.cc:1163: Check failed: valid: Input data contains inf or nan

In addition: Warning message: In FUN(X[[i]], ...) : Error in algorithm m_xgboost_internal The Algorithm will be removed from the Super Learner (i.e. given weight 0)

My code is the following:

n_random_rows <- 75000
random_rows <- sample(1:nrow(data), n_random_rows)
Y_subset <- data[random_rows, outcome_var]
w_subset <- data[random_rows, exposure_var]
c_subset <- as.data.frame(subset(data[random_rows,], select = confounders_with_categorical_as_factors))

matched_pop_subset <- generate_pseudo_pop(Y_subset,
                                 w_subset,
                                 c_subset,
                                  ci_appr = "matching",
                                  pred_model = "sl",
                                  gps_model = "parametric",
                                  use_cov_transform = TRUE,
                                  transformers = list("pow2", "pow3", "sqrt", "log"),
                                  sl_lib = c("m_xgboost"),
                                  params = list(xgb_nrounds = c(10, 20, 30, 50)),
                                  nthread = 15, # number of cores
                                  covar_bl_method = "absolute",
                                  covar_bl_trs = 0.1,
                                  covar_bl_trs_type = "maximal",
                                 optimized_compile = TRUE,
                                  trim_quantiles = c(0.05, 0.95),
                                  max_attempt = 5,
                                  matching_fun = "matching_l1",
                                  delta_n = 0.1,
                                  scale = 1)

Originally posted by @m-qin in https://github.com/fasrc/CausalGPS/discussions/143

Naeemkh commented 2 years ago

See #143