josherrickson commented 3 years ago

Structural

[ ] Should Design support c/rbind?
[x] During the creation of a Design, error if the same variable is used in multiple places
[x] Currently forcing() supports multiple variables. Should it?
[x] Design replacers do not support changing number of variables (e.g. if you created the Design with cluster(a, b), you cannot replace with a single-variable cluster ID).
[x] Should we have helper functions for conversion of types (e.g. RCT to RD)? Mostly would be just modifying @type, but RD requires adding/removing forcing.
[ ] DirectAdjusted output is just lm output. Want that to emphasize treatment effect.
[x] DirectAdjusted validity checker should be more comprehensive to ensure users don't pass malformed lm into as.Directadjusted.
[x] cov_adj should take a type argument to handle response/link for predict.glm. Are there others for other models?
[x] cov_adj just has a tryCatch for predict; do any of the desired supported models require more bespoke handling?
[x] ittestimate needs more input sanitization.
[x] lm(..., weights = ate(des)) is supported but extremely fragile.
[x] ate and ett don't support clusterIds (only ittestimate does). Handle clusterIds in helper function.
[x] ittestimate should support a weights argument for users wanting to pass something like mywt*ate(des). Ignore "target" argument in weights is non-null. ate and ett need to support clusterIds first.
[x] After adding support for weights in ittestimate, if user passes a WeightedDesign object, handle appropriately (move weight to weights, extract Design).
[ ] Overall, documentation is sparse at best.

Stats

[x] If there are any differences during creation of different types of Design objects, it's not implemented. (Aside from only RD supporting forcing.)
[ ] ett and ate generate dummy weights.
[ ] While there are skeletons of vcov.Directadjusted and confint.Directadjusted, these obviously need to be expanded and any additional postho functions implemented.
[x] cov_adj just runs predict.

benthestatistician commented 3 years ago

Regarding non-support of clusterIds= in ate() and ett() calls: Are you reporting that you tried to make this work and couldn't get it to go, or just that you didn't get as far as taking on that project? (If the former, are there commits or other residues of the unsuccessful experiments?)

(Thanks much for this helpful listing, Josh!)

josherrickson commented 3 years ago

Neither; clusterIds is fully supported inside ittestimate; it just needs to be either a) pushed down the call stack , or b) made into a helper function. It's a very low hanging fruit that I'll probably try and knock out if I get some free time this afternoon.

benthestatistician commented 3 years ago

Good that ittestimate() supports it, Josh. I support wrapping up that line, as you propose.

At the same time, I'm not ready to abandon the original concept, with a clusterIDs= argument to each of the weighting functions that permits their being invoked within calls like

glm(<...>, data=mydata, weights=ate(mydesign, clusterIDs=<key(s) common to
mydesign & mydata >))

I.e., I do also wish to serve use cases calling for design-determined weights for use in non-lm() based modeling, and with the data at a different "level" than that of the design table. (E.g., it's a school-randomized study, so that design info is carried in a table of schools, and yet the analyst is modeling off of a student-year table.) In some of these cases ittestimate() will be appropriate to finish the analysis off, but in others the user may wish to pass it into lmer() or other setup customized for the handling of repeated measures. Here the ett(foo, clusterIDs=c("bar", "baz")) form that I had proposed serves the important role of allowing us to store the design-related info along with the fitted lmer model, hidden away in the (weights) column of its model frame. So for this purpose I'd like this form to stay on the agenda, for the time being.

josherrickson commented 3 years ago

Sorry if it was unclear - I fully intend to support that, I just implemented clusterIds inside ittestimate first and haven't gotten around to porting it into ate and ett.

josherrickson commented 3 years ago

Ok, it's done - ate(des, clusterIds = list("oldname" = "newname")) is supported.

benthestatistician commented 3 years ago

Terrific! (Sorry I misunderstood.)

benthestatistician commented 3 years ago

(I'm checking off several cov_adj() concerns that aren't so much resolved as moved into issues #2, #3 or #4, all with the cov_adj() project .)

benthestatistician commented 3 years ago

@josherrickson,

you've got some really nice work here.
I notice your use of UpperCamelCase for some design-declarers. Somewhere I picked up the expectation that that's to be reserved for S4 class names, whereas functions should be lowerCamelCase or snake_case. Would you be OK with snake case for these functions? Failing that, lower camel case?
Thoughts on the handling of acronyms at the front of a function name, e.g. RCTDesign? I may have used things like that in the proposal -- this may well be the origin of your upper camel case function names -- but for the actual package I think I'd prefer rct_design.
In reference to your question about whether forcing() should support multiple variables: Yes, it should. (As it does, thanks!)
In reference to your question about conversions between design types: I can see a use case for conversion from RCT to observational study, if not a central one. So I've posted an issue about this, #6, even as I checked your box in the listing above.

josherrickson commented 3 years ago

Names are completely replaceable; as noted I mostly just took the cues from the document. I do think we should strive to introduce some consistency/rules for names since we're starting from scratch.
What about just design, and have a type = argument? It offers more flexibility if we're going to be adding more types of designs going forward. Otherwise see my response above

benthestatistician commented 3 years ago

Thanks J. while we're listing aspects of the naming scheme for possible revision, I should add that "design" seemed a good thing to write in the proposal, but isn't necessarily best for the software.

The DeclareDesign team sort of beat us to the punch.
I think we expect our "design" object to encode treatment assignments. In principle and at least sometimes in practice, a research design exists prior to the treatment assignment itself. So here we'd lose again, to the DeclareDesign group.
A use case for us is that treatment assignments are made in different times or different places, each encoded in their own "design", only later to be combined into a single stratified "design". Conceptually it's most natural to think of each those constituent things as one piece of a common design, not a separate design.

"Assignment" might fit better, but it's long, easy to misspell and too similar to existing function names, such base::assign()'s.

josherrickson commented 3 years ago

Something related to "Data"? DataDesign? DataStructure? DataOrganization? DataComposition?

benthestatistician commented 3 years ago

... Since posting about this nomenclature issue, I've been thinking "rct_groups()", "rdd_groups()", "matched_groups()" and "obs_groups()".

Rationale:

At first blush "groups" is ambiguous (between conditions, strata and clusters), but when I sit down to write about an RCT or observational study, one winds up writing a lot of "treatment group", "intervention group", "control group" and/or "comparison groups".
When the user provides a formula, e.g. z ~ strata(foo) + forcing(bar), as first argument to any of these functions, the very first thing she's communicating to us is the composition of those groups. That's the one specification of groups that we expect which is mandatory, conceptually at least, strata and clustering each being present for some designs but not others.
Furthermore strata and clustering each have an informatively named helper function to go with them.

josherrickson commented 2 years ago

I believe everything on this list is either addressed or outdated, or not relevant anymore.

benbhansen-stats / propertee

Outstanding concerns #1

Structural

Stats