Open izahn opened 7 years ago
We wrap 'AER' ( https://stats.stackexchange.com/questions/149091/censored-regression-in-r). We should take a closer look, but it might boil down to predicting Y vs Y*.
On Thu, Nov 9, 2017 at 8:59 PM, Ista Zahn notifications@github.com wrote:
Consider these two models (from the tobit vignette)
m1 <- survreg(Surv(durable, durable>0, type='left') ~ age + quant, data=tobin, dist='gaussian')
z.out <- zelig(durable ~ age + quant, model = "tobit", data = tobin)
coefs and vcov are the same:
all.equal(vcov(z.out)[[1]], vcov(m1))
TRUE
all.equal(coef(z.out), coef(m1))
TRUE
but the expected values as very different
predict(m1, newdata = data.frame(lapply(model.frame(m1)[-1], mean)), se.fit = TRUE)
$fit
1
-2.067679
$se.fit
1
1.962933
ev <- sim(setx(z.out))$get_qi(qi = "ev") c(mean = mean(ev), sd = sd(ev))
mean sd
1.5536944 0.6302974
Do we expect these to be so different? If so, why?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/IQSS/Zelig/issues/314, or mute the thread https://github.com/notifications/unsubscribe-auth/AD-wZS4P2Hguu997yScPFP3RsTzaoAuGks5s064HgaJpZM4QY9mo .
Before digging into this further, is this issue adequately addressed by https://github.com/IQSS/Zelig/issues/315?
No, expected values for tobit models are not as documented. Either the documentation or the implimentation is wrong. I suspect the implimentation.
On Nov 11, 2017 5:05 AM, "Christopher Gandrud" notifications@github.com wrote:
Before digging into this further, is this issue adequately addressed by #315 https://github.com/IQSS/Zelig/issues/315?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/IQSS/Zelig/issues/314#issuecomment-343654016, or mute the thread https://github.com/notifications/unsubscribe-auth/AATm0zy2X2d595XJKx1i4H9FIImg5mP2ks5s1XGBgaJpZM4QY9mo .
Thanks for the info. I'll take a look.
This is a hasty test to see if the Y
, rather than Y*
is the issue. The example and Y
calculation procedure is taken directly from: https://stats.stackexchange.com/a/149529
# Create data
N = 10
f = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
fcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)
set.seed(100)
x = rnorm(8*N)+1
beta = 5
epsilon = rnorm(8*N,sd = sqrt(1/5))
y.star = x*beta+fcoeff+epsilon ## latent response
y = y.star
y[y<0] <- 0 ## censored response
test_data <- data.frame(y = y, zero = 0, x = x, f = f)
# Estimate with AER -------------------------------------------------------
library(AER)
fit <- tobit(y~ 0 + x + f, data = test_data)
# E[Y*] ---------------
mean(predict(fit))
## [1] 0.9823243
# E[Y] -----------
mu <- fitted(fit)
sigma <- fit$scale
p0 <- pnorm(mu/sigma)
lambda <- function(x) dnorm(x)/pnorm(x)
ey0 <- mu + sigma * lambda(mu/sigma)
ey <- p0 * ey0
c(mean = mean(ey, na.rm = T), sd = sd(ey, na.rm = T))
## mean sd
## 2.544692 3.354458
# Estimate with Zelig -------------------------------------------------------
zfit <- zelig(y~ 0 + x + f, data = test_data, model = "tobit", cite = FALSE)
# E[Y*] ---------------
mean(unlist(predict(zfit)))
## [1] 0.9823243
ev <- sim(setx(zfit))$get_qi(qi = "ev")
c(mean = mean(ev), sd = sd(ev))
# E[Y] ???? -----------
ev <- sim(setx(zfit))$get_qi(qi = "ev")
c(mean = mean(ev), sd = sd(ev))
## mean sd
## 0.6255104 0.1359505
Note that in this example:
mean(y)
## [1] 2.51392
sd(y)
## [1] 3.353757
which is almost identical to the E[Y]
using Achim's procedure from Stack Exchange, but very different from Zelig. The ev
from Zelig is also fairly different from the E[Y*]
from predict
.
@cchoirat am I missing something about what Zelig is trying to achieve (or misinterpreted the results from this test)?
Consider these two models (from the tobit vignette)
coefs and vcov are the same:
but the expected values as very different
Do we expect these to be so different? If so, why?