benbhansen-stats / propertee

Prognostic Regression Offsets with Propagation of ERrors, for Treatment Effect Estimation (IES R305D210029).
https://benbhansen-stats.github.io/propertee/
Other
2 stars 0 forks source link

Outstanding concerns #1

Closed josherrickson closed 2 years ago

josherrickson commented 3 years ago

Structural

Stats

benthestatistician commented 3 years ago

Regarding non-support of clusterIds= in ate() and ett() calls: Are you reporting that you tried to make this work and couldn't get it to go, or just that you didn't get as far as taking on that project? (If the former, are there commits or other residues of the unsuccessful experiments?)

(Thanks much for this helpful listing, Josh!)

josherrickson commented 3 years ago

Neither; clusterIds is fully supported inside ittestimate; it just needs to be either a) pushed down the call stack , or b) made into a helper function. It's a very low hanging fruit that I'll probably try and knock out if I get some free time this afternoon.

benthestatistician commented 3 years ago

Good that ittestimate() supports it, Josh. I support wrapping up that line, as you propose.

At the same time, I'm not ready to abandon the original concept, with a clusterIDs= argument to each of the weighting functions that permits their being invoked within calls like

glm(<...>, data=mydata, weights=ate(mydesign, clusterIDs=<key(s) common to
mydesign & mydata >))

I.e., I do also wish to serve use cases calling for design-determined weights for use in non-lm() based modeling, and with the data at a different "level" than that of the design table. (E.g., it's a school-randomized study, so that design info is carried in a table of schools, and yet the analyst is modeling off of a student-year table.) In some of these cases ittestimate() will be appropriate to finish the analysis off, but in others the user may wish to pass it into lmer() or other setup customized for the handling of repeated measures. Here the ett(foo, clusterIDs=c("bar", "baz")) form that I had proposed serves the important role of allowing us to store the design-related info along with the fitted lmer model, hidden away in the (weights) column of its model frame. So for this purpose I'd like this form to stay on the agenda, for the time being.

josherrickson commented 3 years ago

Sorry if it was unclear - I fully intend to support that, I just implemented clusterIds inside ittestimate first and haven't gotten around to porting it into ate and ett.

josherrickson commented 3 years ago

Ok, it's done - ate(des, clusterIds = list("oldname" = "newname")) is supported.

benthestatistician commented 3 years ago

Terrific! (Sorry I misunderstood.)

benthestatistician commented 3 years ago

(I'm checking off several cov_adj() concerns that aren't so much resolved as moved into issues #2, #3 or #4, all with the cov_adj() project .)

benthestatistician commented 3 years ago

@josherrickson,

  1. you've got some really nice work here.
  2. I notice your use of UpperCamelCase for some design-declarers. Somewhere I picked up the expectation that that's to be reserved for S4 class names, whereas functions should be lowerCamelCase or snake_case. Would you be OK with snake case for these functions? Failing that, lower camel case?
  3. Thoughts on the handling of acronyms at the front of a function name, e.g. RCTDesign? I may have used things like that in the proposal -- this may well be the origin of your upper camel case function names -- but for the actual package I think I'd prefer rct_design.
  4. In reference to your question about whether forcing() should support multiple variables: Yes, it should. (As it does, thanks!)
  5. In reference to your question about conversions between design types: I can see a use case for conversion from RCT to observational study, if not a central one. So I've posted an issue about this, #6, even as I checked your box in the listing above.
josherrickson commented 3 years ago
  1. Names are completely replaceable; as noted I mostly just took the cues from the document. I do think we should strive to introduce some consistency/rules for names since we're starting from scratch.

  2. What about just design, and have a type = argument? It offers more flexibility if we're going to be adding more types of designs going forward. Otherwise see my response above

benthestatistician commented 3 years ago

Thanks J. while we're listing aspects of the naming scheme for possible revision, I should add that "design" seemed a good thing to write in the proposal, but isn't necessarily best for the software.

  1. The DeclareDesign team sort of beat us to the punch.
  2. I think we expect our "design" object to encode treatment assignments. In principle and at least sometimes in practice, a research design exists prior to the treatment assignment itself. So here we'd lose again, to the DeclareDesign group.
  3. A use case for us is that treatment assignments are made in different times or different places, each encoded in their own "design", only later to be combined into a single stratified "design". Conceptually it's most natural to think of each those constituent things as one piece of a common design, not a separate design.

"Assignment" might fit better, but it's long, easy to misspell and too similar to existing function names, such base::assign()'s.

josherrickson commented 3 years ago

Something related to "Data"? DataDesign? DataStructure? DataOrganization? DataComposition?

benthestatistician commented 3 years ago

... Since posting about this nomenclature issue, I've been thinking "rct_groups()", "rdd_groups()", "matched_groups()" and "obs_groups()".

Rationale:

josherrickson commented 2 years ago

I believe everything on this list is either addressed or outdated, or not relevant anymore.