IQSS / cem

17 stars 5 forks source link

Factor Treatments in Logits #18

Open lotitomaria opened 2 years ago

lotitomaria commented 2 years ago

I'm writing because my co-authors and I are using the cem package in R, attempting to fit a factor variable treatment with 5 categories to a dichotomous dependent variable. (Thank you, by the way, for this amazing resource!) The cem() function runs fine, but the att() function returns an error message that we wanted to ask you about. The message reads:

Error: variable 'LatentClass' was fitted with type "factor" but type "numeric" was supplied In addition: Warning messages: 1: In eval(family$initialize) : non-integer #successes in a binomial glm! 2: In model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : variable 'LatentClass' is not a factor

We've confirmed many times in str() that our treatment is indeed a factor. Then we found this post on github claiming that this sort of model specification can't word (https://github.com/IQSS/cem/issues/2):

"When you create a cem object using a factor variable as a treatment, attempting to use att to run a logistic regression on that object fails. It looks like this is happening because of line 372 (using the GitHub formatting) of the att command, tmp.data[, obj$treatment] <- 0. Assigning 0 to the factor variable changes the variable to a numeric, and then the prd <- predict(out, tmp.data, type = "response") command on the following line fails because the treatment variable is the wrong type. To fix this, you might want to change the assignment on line 258 to assign the reference level of the factor variable if the treatment variable is a factor. Alternatively, you could just coerce everything to numeric, or throw a warning if you try to run att with a factor treatment."

Is it true that cem cannot handle factor variable treatments in a logistic model? If so, do you have a recommended course of action?

Stata also seems to struggle with factor variable treatments with more than one category. The cem command does not generate the weight variable (cem_weights). We've confirmed that when transforming the treatment variable to binary, we get the appropriate cem_weights and can run the analysis. Below is some pasted code in R and Stata:

In R:

str(matching.df)

Coarsen fatalities

fat.grp <- list(c("0","1"), c("2", "3"), c("4"), c("5","6"))

Coarsen Polyarchy

hist(matching.df$s_polyarchy) polycut <- c(0 , .2, .45, .8, 1)

matching.df$LatentClass = as.numeric(as.character(matching.df$LatentClass))

str(matching.df) summary(matching.df) mat <- cem(treatment = "LatentClass", data = matching.df, grouping = list(fatalities_range=fat.grp), cutpoints = list(s_polyarchy = polycut), eval.imbalance = TRUE, drop = "Enable", baseline.group = "3") mat results <- att(mat, Enable ~ LatentClass, data = matching.df, model = "logistic")

In Stata:

*Stata can't run this command with the multilevel treatment: imbalance indiscrim fatalities_range camp_size ab_internat s_polyarchy, treatment(LatentClass)

*Stata doesn't generate weights with the multilevel treatment: recode fatalities_range (0 1 = 1) (2 3 = 2) (4 = 3) (5 6 = 4), generate(fatalities) cem indiscrim fatalities (#0) camp_size ab_internat s_polyarchy (0 , .2, .45, .8, 1), treatment(LatentClass)

jfhelmer commented 1 year ago

I have the same issue trying to use att with a factor treatment variable, factor response variable and a logit model. The error flags the treatment variable as not a factor. Here the error text (P26A is the treatment variable, which is a factor with two levels):

Error: variable 'P26A' was fitted with type "factor" but type "numeric" was supplied In addition: Warning messages: 1: In eval(family$initialize) : non-integer #successes in a binomial glm! 2: In model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : variable 'P26A' is not a factor

dithless commented 2 weeks ago

Also having this issue. I get the error "Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor factor(type) has new level 0", where the real factor is supposed to have levels 1 and 2.