IQSS / Zelig

A statistical framework that serves as a common interface to a large range of models
http://zeligproject.org
109 stars 43 forks source link

Predicted values in the Zelig relogit function? #344

Open aytugsasmaz opened 3 years ago

aytugsasmaz commented 3 years ago

I bring together a household survey (non-candidate) and a candidate survey data, and use the relogit function of the Zelig package to explore the determinants of becoming a candidate. Yet, I am having difficulties with interpreting the predicted values I extract from the Zelig object.

The predicted values that I extract from the logistic regression range from ~0 to ~0.999 as one would expect, while the ones that come from the relogit Zelig object range from -15 to 3747.

dih_lecs_wideintersect.xlsx

Below is a very short R code (24 lines) and attached my dataset to this email so you can see the problem.

library(openxlsx)
library(stargazer)
library(Zelig)

dih_lecs_wideintersect <- read.xlsx("/Users/apple/Desktop/dih_lecs_wideintersect.xlsx")

# Conventional logistic regression
fit.3 <- glm(data=dih_lecs_wideintersect, candidate ~ secularist2 * edu + fem + pro_rights + age_cohort + income_group + factor(mun_en), family=binomial(link = "logit"))
stargazer(fit.3, type="text")
used.data <- as.data.frame(fit.3$model)
used.data$predicted <- predict(fit.3, type = "response")
max(used.data$predicted)

# Rare events logistic regression with Zelig
z.out1 <- zelig(data=dih_lecs_wideintersect, 
                candidate ~ secularist2 * edu + fem + pro_rights + age_cohort + income_group + factor(mun_en), 
                model = "relogit", 
                tau = 0.01,
                case.control = "prior",
                bias.correct = TRUE)
summary(z.out1)
pred.relogit <- t(as.data.frame(predict(z.out1)))
sum(pred.relogit > 3000)
max(pred.relogit)

If you had any suggestions on how to solve this problem or what I am doing wrong, I'd very much appreciate it.