Testing for `vcovDA` - Githubissues

jwasserman2 commented 2 years ago

I've written a first examples script to do what I think is the simplest possible test of the new covariance-adjusted variance estimator vcovDA. It currently just has a non-clustered example, but I'm working on adding a clustered example. Take a look at the script, or let me know if you have thoughts on what examples we should test.

josherrickson commented 2 years ago

I'm not sure inst/examples is the best place for this. I'd think tests is a better location, potentially not-for-CRAN (add it to .Rbuildignore). You could just throw the entire script inside a testthat() block.

My objections to examples:

CRAN is slightly stricter on examples I believe (at least I hit more unexpected warnings in examples than tests)
Do we want these published? (Examples go into the man files.)
Not sure how I feel about expect_* being in man files either.

Benefits to test:

Easier to not-for-CRAN.
Run during test()/make test

Otherwise, looking good!

jwasserman2 commented 2 years ago

In terms of properly scaling the variance-covariance matrix given the differing sample sizes for the covariance and direct adjusted models, I came up with the following derivations (it combines information from Carroll et al. and Stefanski and Boos): Screen Shot 2022-07-05 at 9 29 28 AM

For example, in the scenario where the models use the same data, nq, the number of observations in the DA model matrix, and nc, the number of observations in the covariance model matrix, are the same. This makes the scaling terms the same, and they cancel out the 1 / nq term in front. In the scenario where the models have no overlapping data, the second and third matrices are 0, since A12 = 0, so only the first and last matrices are of interest. The meat of the DA model (B22) is multiplied by nq and an altered version of the meat of the covariance model is multiplied by nc.

I have the exact derivation from the screenshot written down on paper, but I couldn't fit it cleanly in any word processor, so if someone would like to check me I can send them a photo of my work.

benthestatistician commented 2 years ago

I get different values for the scaling factors. Summary of my reasoning (latex source):

rendered late

jwasserman2 commented 2 years ago

I'm able to reproduce this result using your meat matrix instead of what I had. I will update the vignette I'm working on to reflect the discussion here

benthestatistician commented 2 years ago

It occurs to me that an RDD simulation study @adamSales implemented for the proposal could serve us nicely for testing vcovDA. Calls for setting up cov_adj() handers for lmrob objects (produced by robustbase::lmrob()), unless we're fortunate enough to find that our handlers for lms already do the trick (lmrob's inherit from lm). Also calls for functionality in PR #58.

The proposal reported on MSE of covariance-assisted direct adjustment estimates with an lmrob() c.a. model; here the idea would be to add covariance estimation within each simulation replicate, using the ensemble of simulations to evaluate how well the covariance estimates do.

Adam and @jwasserman2, can you coordinate on this? Adam works with another student who has some RDD data; conceivably the point to bring him in would be somewhere in here. Unless it's easier for Adam to adapt his own simulations, after Josh does initial checking and adjustment needed for handling of lmrob() c.a. models.

jwasserman2 commented 2 years ago

The sandwich package doesn't have methods for handling lmrob objects, so per the sandwich infrastructure vignette, our code is currently written to thrown an error in the SandwichLayer/PreSandwichLayer creation. To accommodate the lmrob class, we would need to remove this logic and rewrite some of the A and B submatrix computation functions to eliminate the sandwich package dependency

benthestatistician commented 2 years ago

I take it you checked whether sandwich's methods for lm's can handle lmrobs, @jwasserman2 , finding the answer to be no?
If that's the case, couldn't we also address this issue by writing sandwich type methods for lmrob's? (Ideally non-exported methods, at least for now -- but if that's difficult we could just export.)
If you pass lmrob() argument method="SM", then its default is to report a sandwich-type error estimate. It doesn't use the sandwich package but rather a function internal to robustbase, .vcov.avar1() I believe. Perhaps the needed artifacts could be extracted from it in this case.
Figuring out what's going on down in there is a task that fits with something else I believe @adamSales was going to take on this summer, namely studying up on the M-estimation particulars behind our earlier RDD method.

jwasserman2 commented 2 years ago

bread outputs something when you use sandwich:::bread.lm, but meat and meatCL throw errors because they call estfun, which doesn't have an lmrob method (and lmrob doesn't inherit from lm). Writing our own methods for lmrob objects would make the most sense to me. I would need to convert the variance functions into generics, but that's the way the sandwich codebase is set up, so maybe it makes sense to follow his lead.

I'm not familiar with the robustbase methodology (I tracked down the paper it's based on but haven't dived into it deeply yet), but I imagine the results should be different than the linear case? If so, I'd have to spend some time with the derivations to write tests that make sure our methods are computing the correct outputs

benthestatistician commented 2 years ago

@jwasserman2 this is great, thanks. Rather than immersing yourself into what robustbase is doing, I suggest you let @adamSales (and/or me) look into it and advise you on what to do in setting up an estfun.lmrob.

Adam, here are some relevant notes that I developed during our work on the limitless paper that I've had floating around in various forms since then: sandwich_for_estimator_chains.pdf. You may recognize them. The most salient pieces are the estimating function definitions, and this bit about S-M estimators:

benthestatistician commented 1 year ago

Hi @jwasserman2 , Adam S reminded me just now that we've already written bread() and meat() methods for lmrobs: https://github.com/adamSales/lrd/blob/master/R/ddsandwich.R

Feel free to import into flexida. Could probably use more testing. The RDD simulations I urge earlier in this thread would test them, of course, along with other things.

benthestatistician commented 1 year ago

Thanks to @adamSales we've now got the RDD simulation study I was asking about in this comment from a couple weeks ago on this thread. In inst/rdd-simulation/, at least as of import at 8d586ab.

As he notes, variance estimation isn't working so well in it right at the moment. These variances depend both on flexida-specific infrastructure and on the bread and meat makers for lmrob objects that Adam also imported. I wrote those some years back, at the time confirming that they improved on the native lmrob variance estimator, which had a bug (and seems still to have it). (NB: If we decide to export lmrob methods for bread() and other sandwich functions, we might notify msalibian on this thread, where I noted the problem some years ago.) Unfortunately I couldn't locate those simulations, so QA on these functions remains to be done. Perhaps a first step would be to fit lmrob's on simulated data and confirm that these sandwich SEs track the resampling variances of coefficients.

jwasserman2 commented 1 year ago

This issue has been waiting on my coding of an HC1 estimator, which now has dependence on #119

jwasserman2 commented 1 year ago

Will be addressed by the PR that address #129

benbhansen-stats / propertee

Testing for `vcovDA` #55