CausalInference / gfoRmula

The gfoRmula package implements the parametric g-formula in R. The parametric g-formula (Robins, 1986) uses longitudinal data with time-varying treatments and confounders to estimate the risk or mean of an outcome under hypothetical treatment strategies specified by the user.
149 stars 58 forks source link

Non-conformable arguments when using g-foRmula package #30

Closed BLMoran closed 6 months ago

BLMoran commented 6 months ago

Hi, I've crossposted this to stackoverflow as well (here).

I'm trying to use the gfoRmula package to determine the effect of a time-varying treatment (opioid) on a binary outcome (mortality). There are time-invariant confounders (age, sex) and a time-varying confounder (pain, which is a truncated normal distribution), and a clustering variable (site).

Here is a dataset:

library(tidyverse)
library(TruncatedNormal)

# Simulate Dataset
set.seed(1404)
# Covariance matrix for pain and opioids
mu <- c(91.6, 82.5, 68.2, 3.5, 2.2, 1.14)
pain_opioid_covmat <- c(2500, 1085, 550, 61, 55, 25,
                        1085, 3950, 1580, 42, 23, 10,
                        550, 1580, 4500, 30, 12, 20,
                        61, 42, 30, 3.5, 2.2, 0.1,
                        55, 23, 12, 2.2, 2.7, 0.7,
                        25, 10, 20, 0.1, 0.7, 0.8)
sigma <- matrix(data = pain_opioid_covmat, nrow = 6, ncol = 6, byrow = TRUE)

# Number of Patients
N <- 100
# Number of timepoints
k <- 3
# Variables
age <- rnorm(N, mean = 55, sd = 15)
sex <- as.factor(sample(c("M", "F"), size=N, replace=TRUE, prob = c(0.55, 0.45)))
site = factor(sample(c("1", "2", 3), size=N, replace=TRUE, prob = c(0.2, 0.3, 0.5)))
dat <- rtmvnorm(n=N, mu = mu, sigma = sigma, lb = c(0,0,0,0,0,0), ub = c(Inf, Inf, Inf, 10, 10, 10)) |> 
  as_tibble()

# Add ID column
dat <- dat |> dplyr::mutate(patient_id = row_number())

# Merge other variables
dat <- dat |> 
  cbind(site, age, sex)

# Rename pain and opioid variables
dat <- dat |> 
  dplyr::mutate(opioid_1 = V1, opioid_2 = V2, opioid_3 = V3,
                pain_1 = V4, pain_2 = V5, pain_3 = V6) |> 
  dplyr::select(-c(V1:V6)) |> 
  mutate_if(is.numeric, round, digits = 0)

# Create variable duration of exposure based on whether the patient gets CPIP
dat <- dat |> 
  dplyr::mutate(
    opioid_2 = if_else(opioid_1 <50, NA, opioid_2),
    opioid_3 = if_else(is.na(opioid_2), NA, opioid_3),
    opioid_3 = if_else(opioid_3 <40, NA, opioid_3),
    pain_2 = if_else(is.na(opioid_2), NA, pain_2),
    pain_3 = if_else(is.na(opioid_3), NA, pain_3))

# Create outcome (mortality) variable
dat <- dat |> 
  mutate(mortality = case_when(
    opioid_3 >60 ~ 1, 
    pain_3 >2 ~ 1,
    .default =  0)) |> 
  mutate(mortality = if_else(vent_day == 3, mortality, NA))

# Change data structure to adhere to gfoRmula
dat_long_gform <- dat |> 
  pivot_longer(
    cols = -c(patient_id, site, age, sex, mortality),
    names_to = c(".value", "vent_day"),
    names_pattern = "^([a-z]+)_(\\d+)",
    values_drop_na = FALSE) |> 
  mutate(vent_day = as.numeric(vent_day))

# Change parameters
dat_long_gform <- dat_long_gform |> 
  dplyr::mutate(
    id = patient_id,
    vent_day = case_when(
      vent_day == 1 ~ 0, # Baseline (Day 1)
      vent_day == 2 ~ 1, # Day 2
      vent_day == 3 ~ 2)) |> # Day 3
  dplyr::select(!(patient_id)) |> 
  relocate(id) |> 
  data.table::as.data.table()  

Here is my code using gfoRmula:

library(gformula)

freq_gcomp_mod <- gformula(
  obs_data = dat_long_gform,
  id = 'id',
  time_name = 'vent_day',
  outcome_name = "mortality",
  outcome_type = "binary_eof",
  basecovs = c("age", "sex", "site"),
  covnames = c("pain", "opioid"),
  covtypes = c("truncated normal", "truncated normal"),
  histories = c(lagged, lagavg),
  histvars = list(c("pain","opioid"), c("pain","opioid")),
  covparams = list(covmodels = c(
    pain ~ lag_cumavg1_pain + lag1_opioid + vent_day, #Covariate Model
    opioid ~ lag_cumavg1_opioid + lag1_pain +  vent_day)), #Treatment Model
  ymodel = mortality ~ opioid + pain + age + sex +  (1|site), #Outcome Model
  intvars = 'opioid',
  interventions = "dynamic",
  seed = 1404,
  parallel = TRUE, ncores = 4)

I get the following error message:

Warning: 'x' is NULL so the result will be NULLError in crossprod(bb * X, X) : non-conformable arguments

I'm not sure what is causing the error specifically. Any assistance would be appreciated.

Thanks,

Ben

stmcg commented 6 months ago

Hi Ben,

I see a few issues which could be causing the error message:

Best, Sean

BLMoran commented 6 months ago

Thanks Sean,

In reply to your points:

Thanks,

Ben

stmcg commented 6 months ago

Regarding the truncated normal: Glad to hear the issue went away. And yes, that's right; it only supports truncation in one direction (I believe ultimately based on conventions in the truncreg package). And yes, the 'left' direction is what you are looking for

Regarding the intervention: It looks like you're missing a nested layer in the list. Try list(list(c(threshold, 0, Inf)))

Regarding the hierarchical term: The package only supports GLMs for the outcome model. Heads up though that the CIs around the treatment effects are based on bootstrapping

stmcg commented 6 months ago

I'm closing this thread assuming that your questions have been resolved, but of course feel free to reach out if you have any other questions