benbhansen-stats / propertee

Prognostic Regression Offsets with Propagation of ERrors, for Treatment Effect Estimation (IES R305D210029).
https://benbhansen-stats.github.io/propertee/
Other
2 stars 0 forks source link

extracting Fisher info from nonstandard cov adj's #142

Closed benthestatistician closed 6 months ago

benthestatistician commented 1 year ago

.get_a11_inverse() needed a tweak in order to accommodate objects of robustbase class glmrob as covariance adjustment models, per @adamSales. It needed a

 if(nc==0) nc <- length(cmod$resid)-length(coef(cmod)) 

presumably at around SandwichLayerVariance.R:L432. For future-proofing purposes, perhaps it would be better to explicitly check the type of cmod before performing whatever calculation may be appropriate to recover the needed scaling factor.

benthestatistician commented 1 year ago

@jwasserman2 I'm guessing you may be able to check this off without breaking your stride. Speak up if I'm missing something that makes it more complicated, or if for whatever reason you'd prefer not to get into it.

jwasserman2 commented 1 year ago

I can take this on. Will comment here if any issues come up. @adamSales Could you post some example code that I could work with? Is the issue that if mod is a glmrob model, summary(mod) doesn't return a df element, or are there 0 degrees of freedom?

benthestatistician commented 6 months ago

Let's resume this little thread; I think it should be easy to check off now. To answer @jwasserman2's question (from 8 months ago), the issue is that in the cases Adam is talking about summary(mod) returns a df element but there's nothing in it:

 data(simdata)
 cmod <- robustbase::glmrob(y~x, gaussian, simdata)
names(summary(cmod))
##>  [1] "call"          "terms"         "family"        "iter"         
##>  [5] "control"       "method"        "residuals"     "fitted.values"
##>  [9] "w.r"           "w.x"           "deviance"      "df.residual"  
##> [13] "null.deviance" "df.null"       "df"            "aliased"      
##> [17] "coefficients"  "dispersion"    "cov.scaled"   
summary(cmod)$df
##> NULL
summary(cmod)$df.null
##> NULL

The .get_a11_inverse() code no longer involves an nc constant, but I do see that get_b11() is still using summary(cmod)$df to access the number of observations in the cov adj sample. Perhaps that function should use a check for a NULL value of df, with the length of the residual as a fallback nc value.

length(cmod$fitted.values)
##> [1] 50

But the first step would be to create a test that currently fails which we can then fix. Josh, if you agree then could you point me to a test that you'd suggest I mimic, substituting a glmrob() call for a call to glm() in creating the covariance model? (Alternatively, we could build a test around the failing example Adam gives on #125.)

jwasserman2 commented 6 months ago

.get_b11() isn't used anywhere in our codebase right now. It was deprecated after making our large-scale changes to estfun.DirectAdjusted(), but we kept it around for historical documentation. Our changes to the variance estimation code around numerical stability removed the summary(cmod)$df call from .get_a11_inverse(). Now, .get_a11_inverse() simply calls sandwich::bread(). Does that fix this issue, or does the output of sandwich::bread(cmod) need to be re-scaled?

benthestatistician commented 6 months ago

Thanks for clarifying. No need for rescaling -- at least, there's nothing within this issue that would suggest such a need. What do you think of 20c85ab, on new branch i142, just to mark that we looked at this?

jwasserman2 commented 6 months ago

Works for me

benthestatistician commented 6 months ago

all set here, closing