Closed josherrickson closed 1 year ago
The non-trivial bug report Josh linked has a comment that offers a workaround: quote the call
when storing it as a slot call
. That would mean changing the code here to look like:
if (inherits(lm_model, "glm")) {
lm_model$formula <- as.formula(lm_model, env = eval_env)
}
da_call <- str2lang(paste0("quote(", paste(deparse(lm_model$call), collapse = ""), ")"))
return(new("DirectAdjusted",
lm_model,
Design = design,
absorbed_intercepts = absorbed_intercepts,
absorbed_moderators = absorbed_moderators,
call = da_call))
lm_model
is loaded with calls that can be evaluated in parent.frame()
, however, which in the case of the dichotomy
call in a Design
object, leads to errors. For example, if we run the following code after making the change I propose above, we get the following error:
> data(simdata)
> cmod <- lm(y ~ x, simdata)
> des <- rct_design(z ~ cluster(cid1, cid2), data = simdata)
> mod <- lmitt(y ~ 1, data = simdata, design = des, weights = ate(des), offset = cov_adj(cmod))
Error in str2lang(paste0("quote(", paste(deparse(lm_model$call), collapse = ""), :
<text>:1:7339: unexpected '<'
1: class = "data.frame"), column_index = c("t", "u", "u"), type = "RCT", unit_of_assignment_type = "cluster", call = rct_design(formula = z ~ cluster(cid1, cid2), data = simdata), di
This is because deparse
causes translates des
to this:
Design = new(\"Design\", "
[102] " structure = structure(list(z = c(0L, 0L, 0L, 1L, 0L, 0L, "
[103] " 1L, 0L, 1L, 1L), cid1 = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, "
[104] " 5L, 5L), cid2 = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L)), row.names = c(NA, "
[105] " -10L), class = \"data.frame\"), column_index = c(\"t\", \"u\", "
[106] " \"u\"), type = \"RCT\", unit_of_assignment_type = \"cluster\", "
[107] " call = rct_design(formula = z ~ cluster(cid1, cid2), data = simdata), "
[108] " dichotomy = structure(list(), class = \"formula\", .Environment = <environment>)), "
[109] "
If we wanted to use this deparse
code, we might consider storing the weights
and offset
objects in the eval_env
created here. The code here would then look like:
assign("data", data, envir = eval_env)
assign("design", design, envir = eval_env)
assign("weights", lm_model$call$weights, envir = eval_env)
assign("offset", lm_model$call$offset, envir = eval_env)
lm_model$call$data <- quote(data)
lm_model$call$weights <- quote(weights)
lm_model$call$offset <- quote(offset)
environment(lm_model$terms) <- eval_env
if (inherits(lm_model, "glm")) {
lm_model$formula <- as.formula(lm_model, env = eval_env)
}
da_call <- str2lang(paste0("quote(", paste(deparse(lm_model$call), collapse = ""), ")"))
return(new("DirectAdjusted",
lm_model,
Design = design,
absorbed_intercepts = absorbed_intercepts,
absorbed_moderators = absorbed_moderators,
call = da_call))
In R 4.3 we can run this code without error (which I verified in a Docker image):
> data(simdata)
> simdata$id <- seq_len(nrow(simdata))
> cmod <- lm(y ~ x, simdata)
> des <- rct_design(z ~ cluster(cid1, cid2), simdata)
> mod <- lmitt(y ~ 1, data = simdata, design = des, weights = ate(des), offset = cov_adj(cmod))
> expand.model.frame(mod, "id")
flexida_y flexida::assigned() id
1 -0.97202973 -0.5 1
2 0.27969544 -0.5 2
3 -0.19241487 -0.5 3
4 0.38409370 -0.5 4
5 -0.10447669 -0.5 5
6 -0.95786444 -0.5 6
7 1.25275817 -0.5 7
8 0.72103572 -0.5 8
9 1.00177153 -0.5 9
10 -1.46079240 -0.5 10
11 0.94523053 -0.5 11
12 -1.74651896 -0.5 12
13 -0.58412620 -0.5 13
14 -1.42302351 -0.5 14
15 -2.25867384 0.5 15
16 1.77136846 0.5 16
17 -0.70414747 0.5 17
18 -0.33543528 0.5 18
19 -0.43770366 0.5 19
20 0.33594091 0.5 20
21 1.54963679 -0.5 21
22 1.63040090 -0.5 22
23 -1.23436045 -0.5 23
24 -1.40921131 -0.5 24
25 -1.56342486 -0.5 25
26 -1.30385896 -0.5 26
27 1.90860302 -0.5 27
28 -0.04310819 -0.5 28
29 -0.89336926 -0.5 29
30 -0.65191417 -0.5 30
31 1.02370535 0.5 31
32 0.20984377 0.5 32
33 -0.36502604 0.5 33
34 -0.80038416 0.5 34
35 -0.91295239 -0.5 35
36 1.99728624 -0.5 36
37 0.88916602 -0.5 37
38 1.95793306 -0.5 38
39 -0.47212763 -0.5 39
40 -0.40158848 -0.5 40
41 -1.07813466 0.5 41
42 -0.30127319 0.5 42
43 0.42110541 0.5 43
44 1.30818576 0.5 44
45 0.51341454 0.5 45
46 0.40522603 0.5 46
47 1.18019960 0.5 47
48 1.09638279 0.5 48
49 0.05584398 0.5 49
50 -0.83407072 0.5 50
and mod@call
looks like:
stats::lm(formula = flexida_y ~ 0 + `flexida::assigned()`, data = data,
weights = weights, offset = offset)
This did cause some errors in the test suite for both R 4.2.3 and R 4.3, but I haven't had a chance look into them. I'm going to push my code to a branch for people to look at. If this is a route of interest, I can look into the errors it causes in the test suite and if they seem like major impasses.
Would da_call <- call("quote", lm_model$call)
be sufficient to avoid the deparsing complexity? It does appear to pass basic checks to allow an @call
slot.
I looked at one of the failing tests; it was failing on https://github.com/benbhansen-stats/flexida/blob/fc3b11c6e939258c7498f19363d08b80a33159e3/R/SandwichLayerVariance.R#L434-L435 specifically the na.action = na.pass
argument - without it, the matrix is returned.
The number of errors that are appearing from this fix give me pause. I wonder if a better solution would be along the lines we had discussed; keeping our own version of expand.model.frame
that we maintain closer to the 4.2.3 branch. Part of my motivation here is that I think the 4.3 functionality of stats::expand.model.frame
might be considered a bug; it will straight up fail on any object that does not have an @call
slot (see the source) without any fallback to 4.2.3 functionality.
The other option is to explore adding our own getCall.DirectAdjusted
. As I mentioned, in the 5 minutes I played around with it I was unsuccessful, but I think that might be more fruitful.
Either of these would avoid having to monkey around with calls.
Hmm, I don't know what I was doing before to muck it up so badly, but adding
#' @export
getCall.DirectAdjusted <- function(x, ...) {
return(x$call)
}
passes all tests sans one:
✖ | 1 91 | test.SandwichLayer [0.2s]
─────────────────────────────────────────
Failure (test.SandwichLayer.R:585:3): .get_ca_and_prediction_gradient warns about less than full rank model fit
`stats_preds <- stats::predict(cmod, newdata)` did not throw the expected warning.
─────────────────────────────────────────
I'm not sure whether that's sufficient (should we return a call with lmitt
as the function rather than lm
?) but this is at least a simpler approach.
I agree with your feeling about it being kind of a buggy change. Your getCall.DirectAdjusted
approach is simpler than mine, and on its face seems less likely to cause other issues (it feels natural for getCall
to look for a call
attribute rather than fail if there's no slot named call
anyways). The call itself I think should have lmitt
rather than lm
so that if someone were to evaluate it, it would return a DirectAdjusted
object, but otherwise this change seems reasonable to me
Going to bed last night, I was mentally composing a comment suggesting that we combine these two; add getCall
and add a slot (either @call
with some quote
nonsense, or just call it something different).
However, I've reconsidered for the following reasons:
lmitt.formula
call and pass it down to .convert_to_lmitt
to attach to @call
. However, it's less clear what as.lmitt
would keep as it's @call
. The as.lmitt(lm(...))
call? We'd need to somehow pass the arguments in lm
up to as.lmitt
. Or do we just use the lm
call? In which case it doesn't really represent what it's supposed to, namely calling eval(as.lmitt(lm(...))@call)
wouldn't return a DirectAdjusted
objected.as.lmitt
aside for a second, even just considering lmitt.formula
, the actual call attached the lm
inside a DirectAdjusted
object is quite different from the call to lmitt.formula
due to all the monkeying around I do in there. We'd need to do all sort of special casing to even get expand.model.frame
to work with the lmitt.formula
call the user actually wrote.Given that DirectAdjusted
is a S4 object with slots (@
) containing an S3 object with elements ($
), we have a somewhat "private" call object right now in terms of the lm$call
which can be used for things like expand.model.frame
without directly exposing users. We could potentially add a separate @call
to DirectAdjusted
objects and keep getCall
pointed towards the $call
, but if we go that route, I suspect it won't be the last time the existence of both a @call
and $call
causes a headache. If we come across a strong argument for the existence of @call
we can revisit, but right now I don't see one. If we do go down that route, we should probably considering using the ...
in getCall
to allow a choice of $call
vs @call
.
So ultimately, I think we go with getCall
and modify it to change lm
to lmitt
so if a user really need to eval
it.
(On a related note, update(lmitt(...))
doesn't work currently on "main" [giving the same error about unable to find "call"], but does work after adding the getCall.DirectAdjusted
above - albeit generating an lm
as expected. I'm not sure update.DirectAdjusted
should work [given that there's only varieties of it, y ~ 1
and y ~ sbgrp
], should we generate a dummy function that just errors? Just switching the lm
to lmitt
in the returned call won't solve the issue because lm
's first argument is "formula"
whereas lmitt
's first argument is "obj"
since it can be either a formula or an lm
object. @benthestatistician )
Actually, I think we probably shouldn't change lm
to lmitt
in getCall.DirectAdjusted
.
"obj"
vs "formula"
.DirectAdjusted
wouldn't actually be accepted into lmitt.formula
! (It contains both a 0 (for no intercept) and a call to assigned()
.) We could of course work around this, but for what gain?Though perhaps this is an argument for a separate @call
slot.....
I'm assigning myself for this issue - I disabled my check of consistency between stats::expand.model.frame
and flexida:::.expand.model.frame.DA
pending an offline discussion of whether we need these two to be in sync going forward.
Per offline discussion:
@callDA
or similar slot containg the actual lmitt.formula
or as.lmitt
call. .expand.model.frame.DA
with expand.model.frame.DirectAdjusted
to expand on the generic stats::expand.model.frame
and export it.expand.model.frame
as the base.There is now a @lmitt_call
slot on DA objects that stores the users actual call. getCall()
continues to return the $call
from the internal lm
.
Turns out that expand.model.frame
is not a generic. We can still define expand.model.frame.DirectAdjusted
, but stats::expand.model.frame
won't dispatch to it. Do we still want to export it (and thus probably rename the existing .expand.model.frame.DA
)?
Despite what I said at our call, I ended up implementing both the <=4.2.3 and >=4.3 versions of expand.model.frame
. Since both work for us (we weren't hitting the error that spurred the 4.3 change) I figure there is no harm just keeping things version-specific.
@jwasserman2 and I will have more to add about this issue in the future; just opening now to save some URLs.
Short version:
stats::expand.model.frame
changed in R 4.3.0: https://bugs.r-project.org/show_bug.cgi?id=18414. It now looks for acall
slot, which we don't currently have, causing errors. Adding one is non-trivial due again to this bug. (I also briefly looked into writing our owngetCall.DirectAdjusted
but was unsuccessful; with more time this might be an appropriate solution.)Diff of
stats::expand.model.frame
changes: