Treat subgroup variables as categorical

benbhansen-stats / propertee

Prognostic Regression Offsets with Propagation of ERrors, for Treatment Effect Estimation (IES R305D210029).

https://benbhansen-stats.github.io/propertee/

Other

2 stars 0 forks source link

Treat subgroup variables as categorical #81

Closed josherrickson closed 1 year ago

josherrickson commented 2 years ago

If a user calls something like

lmitt(y ~ sbgrp, design = des)

we want to estimate a treatment effect for each level of sbgrp via interaction. This interaction is carried out currently without issue, but we do not check nor force sbgrp to be categorical.

1) Do we want to convert sbgrp to categorical ourselves? 2) If yes to 1., do we want to leave it in the model as as.factor(sbgrp) or do some renaming so that its actually sbgrp. Just sbgrp would look nicer, but be harder to do. Additionally, leaving it as as.factor(sbgrp) is a nice reminder to users to do that conversion a priori. 3) If yes to 1., do we want to put any restrictions on the variable before we're willing to convert it? E.g. maximum number of unique groups? Something like "Looks like sbgrp is continuous, so no automatic conversion to factor." 4) If no to 1., do we want an error or warning on a continuous variable being passed?

benthestatistician commented 2 years ago

Subgrouping variables are the main thing we're focused on, but I don't know that we should exclude other types of moderating variables, so to speak. (If eventually decide that we should, I might sooner enforce it by erroring on numeric right hand side variables, rather than converting them.)

Can we readily describe what gets done with a number right hand side variable? I'd guess that as things now stand, for numeric x lmitt(y ~ x, design = des) gets you the result of:

absorb x and the intercept into the assignment variable, giving $\tilde{z}$
regress y on $\tilde{z}$, with no intercept
report single $\tilde{z}$ coeff, w/ name of the treatment variable

Sound right? If so, then one use of this sort of thing would be in estimating interactions of the treatment effect with a Peters-Belson prediction.

josherrickson commented 2 years ago

Since we don't have that residualization implemented yet (#59), lmitt(y ~ x, ...) currently just fits lm(y ~ assigned() + assigned():x, ...) which then reports a single assigned():x coefficient.

Once that residualization is implemented, we can choose how we want it to work of course, but if we didn't make special cases, what you described would basically work, albeit with perhaps a different/unclear name in step 3.

jwasserman2 commented 2 years ago

Users have to deal with a similar thing when they input a numeric variable to lm that they actually mean to be categorical. When they have a binary subgroup, using a numeric variable works fine and the summary and coefficients of their model are what they expect. When their numeric subgroup variable is non-binary though and they see their model only has one coefficient, they'll realize they needed to call as.factor. I think our package aligning with that typical process makes sense, rather than potentially creating other issues by forcing someone to make a binary numeric variable a factor.

josherrickson commented 1 year ago

Per discussion: If a user passes a continuous sbgrp, return coefficients on both assigned() and assigned():sbgrp.

josherrickson commented 1 year ago

I implemented continuous "subgroup" variables returning a main effect + interaction in a branch; if no one wakes up in a cold sweat because this is the wrong approach, I'll merge it into main in a few days.

josherrickson commented 1 year ago

Bumping this for myself - I lost track of this branch. This is coming up in reference to #128 and an offline discussion Ben and I had.

josherrickson commented 1 year ago

Continuous moderators are now supported. @jwasserman2 @xinhew0708 Note that the model changes. With a categorical moderator x, we fit:

y ~ 1 + x + x:treatment

With a continuous moderator x, we fit:

y ~ 1 + x + treatment + x:treatment

I'm unsure if this will have any impact on your calculations/code. If it'd be useful to track in absorbed_moderator whether it's categorical or continuous, let me know.

jwasserman2 commented 1 year ago

Given the trouble we went through the last few months trying to understand residualization, I feel like there should be tests for the coefficients and their standard errors before this gets pushed to main. I think we should also make a package version prior to this commit, and this commit merits a version increment

josherrickson commented 1 year ago

Making a release and incrementing version seems prudent.

I'm not seeing the risk to putting this in main - it has no impact on models without continuous moderators and passes all existing tests. While I agree it may not be ready for use in actual analysis yet, I don't see how this could negatively affect current analyses.