combining cov estimates from lmitt()/lm() calls differing only in the formula

benthestatistician commented 1 year ago

We'll need combined covariance matrix estimates for coefficients fitted with say lm(y~a.(), ...) and lm(y~a.()*sbgrp, ...). Or with lmitt(y~sbgrp1, ...) and lmitt(y~sbgrp2,...). (For multiplicity correction via multcomp, among other purposes.).

Action steps t.b.d., with corresponding updates to this issue header; for now let's discuss.

benthestatistician commented 1 year ago

Context: Diagonal blocks of the combined covariance estimate are easily obtained, e.g. via vcovDA(lmitt(y~sbgrp1, ...)) and vcovDA(lmitt(y~sbgrp2,...)), but the off-diagonal pieces call for more effort. From the perspective of sandwich estimation, the relevant A matrices are block diagonal, but the B matrix is not, calling for cross-products of estimating function contributions on the off diagonals.

Couple notes:

I've posed the issue as one of combining outputs of different lmitt() calls. That will be good to be able do, but once we've figured it out we may well want to use the underlying routines within certain single lmitt() calls: lmitt(y ~ 1 + sbgrp,...) for example, could invoke and then stitch together lm(y~a.(),...) and lm(y~a.():sbgrp,...).
The hypotheticals I've put into the issue header involve calls to lmitt() and lm() using the same left hand sides, but I suspect it wouldn't be much more difficult to combine results from calls with different ys, so long as the design and the data were otherwise the same. It'd be nice to set up the underlying infrastructure in such a way as to help us handle these situations as well.
There will of course be cases where the lm results being joined use different subsets of the data, due lm() use of na.omit() and the left-hand sides of the regression equations implying different sets of complete cases.
Let's not worry too much about making this super user-friendly, at least at the outset. I'll be pretty happy if we can use it for stitching together pre-set combinations, e.g. lm(y~a.(),...) + lm(y~a.():sbgrp,...) in response to lmitt(y~sbgrp,...), and if we flexida developers can use it for bespoke solutions in our own projects.

benthestatistician commented 1 year ago

@jwasserman2 I had the impression you were getting a handle of what this subproject called for. How would you feel about having it go into your queue?

Another possibility would be for you and @xinhew0708 to divide responsibility. Xinhe, perhaps you could write a memo in markdown or simple LaTeX detailing in terms of the two models' A and B matrices and estimating equations the A and B matrices of the vector parameter concatenating the two models' fitted parameters? Then you and Josh might collaborate on expanding this into a specification document, something broadly along the lines of the sandwich infrastructure vignette. Alternatively, you might extend that vignette.

I should clarify that I'm not asking you to bump other things out of the queue in order to work on this. I had at one point thought of this as something we might need this term, for the Texas project; but that looks less likely every day. Rather, I'm hoping to map out a plan on roughly a 6 month horizon. In the immediate term, I'd like to divide responsibilities and assign the task appropriately.

jwasserman2 commented 1 year ago

I've returned back to this and scoped out the theoretical calculations, but before I move onto the coding part, I still don't see where in the multcomp package you can provide a function multiple models and perform simultaneous inference on parameters from both (or multiple) of them. @benthestatistician if you can point that out to me, it will be helpful for me to plan the coding specs.

benthestatistician commented 4 months ago

see ?multcomp::mmm
would be good to test that the glht/mmm workflow described there gives unified vcov's whose model-specific diagonal blocks agree with what vcov_tee() returns
related to this, we should test/hopefully verify that glht() passes ... args down to vcov_tee().

benbhansen-stats / propertee

combining cov estimates from lmitt()/lm() calls differing only in the formula #79