lmitt(..., absorb=TRUE) when there are no blocks

benthestatistician commented 1 month ago

If absorption is requested when there are no blocks, lmitt.formula() is currently set to stop (lmitt.R lines 245-53). I see an advantage instead taking the entire comparison sample to be a single block, then proceeding with the rest of the absorption logic. I.e., partialling out the intercept from the independent variables.

benthestatistician commented 1 month ago

If there are no concerns about this, I'll ask @josherrickson could address this (and #184). First, though, let's hear if there's any discussion.

jwasserman2 commented 1 month ago

I'm in favor of this update. Since we have this logic here: https://github.com/benbhansen-stats/propertee/blob/7d8fb2f9dbabdbe397ac53c17a98374dd6f0231a/R/lmitt.R#L297-L299 I think we may be able to just change the logic at the lines Ben mentioned to make blocks a vector of 1's (with length given by the number of rows of data) when there are no blocks in the design

josherrickson commented 3 weeks ago

No objections, but is there any reason not to separate these functionality? Rather than have it be a side-effect of calling blocks in a somewhat non-intuitive way?

benthestatistician commented 3 weeks ago

Yes, I think so. For users who understand "absorb" to be about absorbing one more more intercept terms into other independent variables, one should be able to absorb whether or not one has defined any blocks. So these users would be confused by a stop with error message about their designs not having any blocks.

benthestatistician commented 3 weeks ago

@josherrickson does it work for you to take this on, perhaps over the next week or two? (...if i was premature in removing the Discussion label, do continue to discuss.)

josherrickson commented 3 weeks ago

Yes, I can do this.

josherrickson commented 3 weeks ago

Another thought: Instead of updating lmitt, should we update Design objects to always have blocks? If the user doesn't input one, we automatically add a column of 1's. We could even suppress reporting it when it's constant.

benthestatistician commented 3 weeks ago

Sounds like the right approach to me!

jwasserman2 commented 3 weeks ago

I like the idea of focusing more on the Design object. Instead of adding blocks by default, what are your thoughts on just making the blocks() accessor return a column of 1's when there are no blocks i.e., changing the code here: https://github.com/benbhansen-stats/propertee/blob/17c85dad68a6387a42dc8c157bba0c515a2a8111/R/DesignAccessors.R#L391-L397

I think we could (and should) be using this in lmitt.formula() anyways

jwasserman2 commented 3 weeks ago

We need to be careful about how this affects weight creation given the logic we use (EDIT: actually, unless there are some unforeseen bugs, the weight calculations should still return the correct weights): https://github.com/benbhansen-stats/propertee/blob/17c85dad68a6387a42dc8c157bba0c515a2a8111/R/weights.R#L155-L165 as well as any of @xinhew0708 's variance estimation code

josherrickson commented 3 weeks ago

I like the idea of focusing more on the Design object. Instead of adding blocks by default, what are your thoughts on just making the blocks() accessor return a column of 1's when there are no blocks i.e., changing the code here:

https://github.com/benbhansen-stats/propertee/blob/17c85dad68a6387a42dc8c157bba0c515a2a8111/R/DesignAccessors.R#L391-L397

I think we could (and should) be using this in lmitt.formula() anyways

I like this idea. It's a minimal touch, and assuming all tests pass, unlikely to cause issues. It still allows us to check if there's a otherwise defined block variable in the model.

I'll work on getting this implemented tomorrow or early next week.

josherrickson commented 2 weeks ago

How do we want this to interact with dtable?

a. Ignore the column of 1's and continue to warn or error. This is the current output when asked for a "blocks" in a block-less design:

> des <- rct_design(z ~ cluster(uoa1, uoa2), data = simdata)
> dtable(des, "block")
< table of extent 0 >
Warning message:
In .get_xydat(x, design) : blocks not found in design
> dtable(des, "treatment", "block")
treatment
0 1 
6 4 
Warning message:
In .get_xydat(y, design) : blocks not found in design

b. Treat the blocks as 1s, as they come into the function.

If we choose b., I think we need to update everywhere (specifically documentation) that blocks aren't optional in the sense that we put a constant block if the design is missing it.

On the other hand, a. is easier and doesn't change the overall flow of things. It also appears, using only the testsuite, to be the only place where special casing is required with this change. (I haven't tested if the functionality works as expected everywhere, but this is the only place where the change causes tests to fail.)

josherrickson commented 2 weeks ago

A third option: Do we want to try and "hide" this change to blocks from the user? Perhaps add an argument to blocks() that returns the columns of 1's if blocks aren't specified, otherwise returning as it currently is? Then we can use that argument wherever needed internally, but users aren't affected by this unless they explicitly look for it.

I think I like this option the best.

josherrickson commented 2 weeks ago

I went forward with the third option. There's a calculation error to be ironed out:

> des <- rct_design(z ~ uoa(uoa1,uoa2), data = simdata)
> lmitt(y ~ x, data = simdata, design = des, absorb = TRUE)$coeff
(Intercept)          z.           x        z._x 
 0.03920000  0.09294351 -0.22526910  0.24941721 
> lmitt(y ~ x, data = simdata, design = des, absorb = FALSE)$coeff
(Intercept)          z.           x        z._x 
 0.02674270  0.09294351 -0.22526910  0.24941721

but making progress. Out of time to continue right now, I'll return to this on Wednesday or Friday - with this current plan, unless I hear otherwise.

benthestatistician commented 2 weeks ago

I'm happy with the third option. The remaining calculation error could be wrapped in with #188, which proposes to add a separate step of re-calculating intercepts to the absorb=T path generally. (The example helped me catch an error, which I think I've now corrected, in my last comment over there about how to do that re-calculation. Thanks!)

josherrickson commented 1 week ago

I'm actually not sure whether the result I showed in the last comment is an error or not - it's been long enough since I've thought about the residualization of the intercept that I keep going back and forth in my mind. In the case with subgroups, we expect the intercept to be identically 0. Do we expect that in this case?

Alternatively, do you have a worked example of what the result should look like for comparison?

benthestatistician commented 1 week ago

With no moderator[^1], we expect that if we absorb on the right while also fitting an intercept as well as a treatment variable, then that intercept should be identically 0. However, with a continuous moderator this is not the case, unless that moderator happens to be mean-centered already.

That said, to be clear: Once #188 is implemented, we'll no longer be returning that numerical zero in the (Intercept) coefficient position. Instead we'll be doing another calculation, an averaging of outcomes under the control group, and returning it there instead. (With a corresponding replacement of that estimating function column.) This alternate intercept calculation will better align with what's returned in the (Intercept) position after fitting with IPW weights rather than absorption, and faciliate interpretation of the results.

[^1]: The situation with subgroups has some subtleties I can't stop to review, so commenting on main effects instead.

josherrickson commented 1 week ago

Ok, so then I still have a numeric issue somewhere:

> des <- rct_design(z ~ uoa(uoa1,uoa2), data = simdata)
> lmitt(y ~ 1, data = simdata, design = des, absorb = FALSE)$coeff
(Intercept)          z. 
    -0.0090      0.1205 
> lmitt(y ~ 1, data = simdata, design = des, absorb = TRUE)$coeff
(Intercept)          z. 
     0.0392      0.1205

benthestatistician commented 1 week ago

Weird. Maybe I spoke too fast when I said the intercept should be identically 0. Perhaps that's only true if with both sides absorption.

(Again, if this issue is only affecting the intercept then it won't matter after #188.)

benthestatistician commented 1 week ago

Just noting a connection to #69. Point 2 of this comment on that thread is addressed by this issue.

jwasserman2 commented 1 week ago

My though is that it's because we're still regressing on an intercept column of 1's and not centering the outcomes. We drop the intercept column that should be all zeros here: https://github.com/benbhansen-stats/propertee/blob/5976f8d8dadc0d4afeafb6f668a259d97fadc41b/R/lmitt.R#L304 Then we regress with an intercept based on the formula here: https://github.com/benbhansen-stats/propertee/blob/5976f8d8dadc0d4afeafb6f668a259d97fadc41b/R/lmitt.R#L321-L325 We shouldn't really be looking at the estimated (Intercept) anyways, right? If we consider the regression like Angrist and Pischke do on pg. 223, there technically isn't any intercept term to regress on, since it's been differenced out

josherrickson commented 1 week ago

I've merged the branch into main. Regardless of any outstanding calculation concern, it passes tests/checks. @benthestatistician up to you to keep open or close.

jwasserman2 commented 1 week ago

Can you add something to the NEWS file?

benbhansen-stats / propertee

lmitt(..., absorb=TRUE) when there are no blocks #187