Closed jwasserman2 closed 1 year ago
I'm not sure inst/examples is the best place for this. I'd think tests is a better location, potentially not-for-CRAN (add it to .Rbuildignore). You could just throw the entire script inside a testthat()
block.
My objections to examples:
expect_*
being in man files either.Benefits to test:
test()
/make test
Otherwise, looking good!
In terms of properly scaling the variance-covariance matrix given the differing sample sizes for the covariance and direct adjusted models, I came up with the following derivations (it combines information from Carroll et al. and Stefanski and Boos):
For example, in the scenario where the models use the same data, nq
, the number of observations in the DA model matrix, and nc
, the number of observations in the covariance model matrix, are the same. This makes the scaling terms the same, and they cancel out the 1 / nq
term in front. In the scenario where the models have no overlapping data, the second and third matrices are 0, since A12 = 0, so only the first and last matrices are of interest. The meat of the DA model (B22) is multiplied by nq
and an altered version of the meat of the covariance model is multiplied by nc
.
I have the exact derivation from the screenshot written down on paper, but I couldn't fit it cleanly in any word processor, so if someone would like to check me I can send them a photo of my work.
I get different values for the scaling factors. Summary of my reasoning (latex source):
I'm able to reproduce this result using your meat matrix instead of what I had. I will update the vignette I'm working on to reflect the discussion here
It occurs to me that an RDD simulation study @adamSales implemented for the proposal could serve us nicely for testing vcovDA
. Calls for setting up cov_adj()
handers for lmrob
objects (produced by robustbase::lmrob()
), unless we're fortunate enough to find that our handlers for lm
s already do the trick (lmrob
's inherit from lm
). Also calls for functionality in PR #58.
The proposal reported on MSE of covariance-assisted direct adjustment estimates with an lmrob()
c.a. model; here the idea would be to add covariance estimation within each simulation replicate, using the ensemble of simulations to evaluate how well the covariance estimates do.
Adam and @jwasserman2, can you coordinate on this? Adam works with another student who has some RDD data; conceivably the point to bring him in would be somewhere in here. Unless it's easier for Adam to adapt his own simulations, after Josh does initial checking and adjustment needed for handling of lmrob()
c.a. models.
The sandwich
package doesn't have methods for handling lmrob
objects, so per the sandwich infrastructure vignette, our code is currently written to thrown an error in the SandwichLayer
/PreSandwichLayer
creation. To accommodate the lmrob
class, we would need to remove this logic and rewrite some of the A
and B
submatrix computation functions to eliminate the sandwich
package dependency
lm
's can handle lmrob
s, @jwasserman2 , finding the answer to be no? lmrob
's? (Ideally non-exported methods, at least for now -- but if that's difficult we could just export.)lmrob()
argument method="SM"
, then its default is to report a sandwich-type error estimate. It doesn't use the sandwich package but rather a function internal to robustbase
, .vcov.avar1()
I believe. Perhaps the needed artifacts could be extracted from it in this case. bread
outputs something when you use sandwich:::bread.lm
, but meat
and meatCL
throw errors because they call estfun
, which doesn't have an lmrob
method (and lmrob
doesn't inherit from lm
). Writing our own methods for lmrob
objects would make the most sense to me. I would need to convert the variance functions into generics, but that's the way the sandwich
codebase is set up, so maybe it makes sense to follow his lead.
I'm not familiar with the robustbase
methodology (I tracked down the paper it's based on but haven't dived into it deeply yet), but I imagine the results should be different than the linear case? If so, I'd have to spend some time with the derivations to write tests that make sure our methods are computing the correct outputs
@jwasserman2 this is great, thanks. Rather than immersing yourself into what robustbase is doing, I suggest you let @adamSales (and/or me) look into it and advise you on what to do in setting up an estfun.lmrob
.
Adam, here are some relevant notes that I developed during our work on the
limitless paper that I've had floating around in various forms since then: sandwich_for_estimator_chains.pdf. You may recognize them. The most salient pieces are the estimating function definitions,
and this bit about S-M estimators:
Hi @jwasserman2 , Adam S reminded me just now that we've already written bread()
and meat()
methods for lmrob
s: https://github.com/adamSales/lrd/blob/master/R/ddsandwich.R
Feel free to import into flexida. Could probably use more testing. The RDD simulations I urge earlier in this thread would test them, of course, along with other things.
Thanks to @adamSales we've now got the RDD simulation study I was asking about in this comment from a couple weeks ago on this thread. In inst/rdd-simulation/
, at least as of import at 8d586ab.
As he notes, variance estimation isn't working so well in it right at the moment. These variances depend both on flexida-specific infrastructure and on the bread and meat makers for lmrob objects that Adam also imported. I wrote those some years back, at the time confirming that they improved on the native lmrob variance estimator, which had a bug (and seems still to have it). (NB: If we decide to export lmrob
methods for bread()
and other sandwich functions, we might notify msalibian on this thread, where I noted the problem some years ago.) Unfortunately I couldn't locate those simulations, so QA on these functions remains to be done. Perhaps a first step would be to fit lmrob's on simulated data and confirm that these sandwich SEs track the resampling variances of coefficients.
This issue has been waiting on my coding of an HC1 estimator, which now has dependence on #119
Will be addressed by the PR that address #129
I've written a first
examples
script to do what I think is the simplest possible test of the new covariance-adjusted variance estimatorvcovDA
. It currently just has a non-clustered example, but I'm working on adding a clustered example. Take a look at the script, or let me know if you have thoughts on what examples we should test.