Closed josherrickson closed 1 year ago
Interesting. Followup q's:
a. Is lmitt.R#L160-L166 where you're seeing this problem? Relatedly, b. I wonder why we're only seeing the problem now -- is it really something intrinsic to the absorption, or just that accommodating absorption is prodding us to test scenarios we didn't check previously, which could also arise without absorption?
If the latter, then maybe we need expect similar problems on the lm(<...>, offset=cov_adj(<...>))
path, not only on the lmitt.formula(<...>, offset=cov_adj(<...>))
path; in any case we may have to rethink cov_adj()
. If we do have these broader problems with cov_adj()
, I see no reason to preserve its current behavior with respect to clusters not anticipated in the Design; rather it's whatever makes things work. (Perhaps the next task here is to map out the various contingencies and mock up tests of them, en route to settling on a behavior that we hope will fix this issue.)
If on the other hand the issue is intrinsic to the absorption scenario, then perhaps it could be fixed by separating lmitt.formula()
's reconstruction of offsets from it reconstruction of other model frame elements. Looking into the lmitt.formula()
sources, if we stripped the offset argument off of the mf.call
then we ought to be able to create mf
without length mismatch issues. Then we might separately recreate the cov_adj
offset, with its mismatching length, creating perhaps offset_mf
. Finally we root around in the na.action
's of mf
and offset_mf
in order to sort out what to use from the recreated offset. (In suggesting this approach I don't mean to suggest loyalty to having cov_adj()
return objects of the length it's currently returning, only to suggest a preference for the more localized change.)
What's the issue it's causing? If the weights are NA for those observations, they should be dropped from the model fit anyways, so that shouldn't cause problems right? I also think it makes sense as currently constructed that cov_adj()
makes valid predictions for those observations because it doesn't leverage any design information, only covariate information. I feel like it would make sense to keep the cov_adj()
behavior as is and instead construct a vector indicating observations to NA out or keep as is based on whether they are part of clusters that match the provided Design
object
Due to refactoring in the absorb branch, this is no longer an issue.
If the
Design
object has fewer clusters than in the data being passed to the model,cov_adj()
is returning values for the "missing" clusters.So now
simdata
contains one additional cluster which is not founddes
.The weight functions properly flag the observations with "new" clusters as NA. (
assigned()
also operates this way.)This is an issue in the absorb branch. Because we're doing so much manipulation of the objects and formulas, by the time we get to using the offset, the other pieces of the data going into the final
lm()
call have only e.g. 44 rows. So it errors due to a length mismatch.cov_adj()
work this way?