benbhansen-stats / propertee

Prognostic Regression Offsets with Propagation of ERrors, for Treatment Effect Estimation (IES R305D210029).
https://benbhansen-stats.github.io/propertee/
Other
2 stars 0 forks source link

vignette issues #70

Closed josherrickson closed 1 year ago

josherrickson commented 1 year ago

While rebuilding the reference site, I ran into issues with two vignettes.

1) covadj_with_clustering: I made several tweaks, mostly around type argument being HC0 vs CR1. There was also two tests I couldn't make pass so I commented them out. See d681f704e for the details. 2) polynomialSimulation just wouldn't run so I changed the extension to avoid building it.

benthestatistician commented 1 year ago

Re newly failing tests in the covadj_with_clustering vignette, perhaps it's just a matter of HC0 and CR1 differing a tad in their "small sample degrees of freedom adjustments"? @jwasserman2

Regarding the polynomialSimulation vignette, I'm not sure whether it's expected to run out of the box at this point. I don't see a problem with leaving it out of the reference site (although others may), but it's more problematic if we don't know what it needs in order to run. If Josh W doesn't know or can't figure this out, then let's consult with Adam Sales (deliberately not tagging him as of yet) to either (a) get help with modification of the script so that it runs without intervention when we rebuild the reference site, or (b) make sure it's well documented within the file itself how to get it to run.

josherrickson commented 1 year ago

Apparently .Rmd files starting with _ do not get built by pkgdown. So let's adopt that for vignettes we don't currently expect to build cleanly. (This is a separate concern from using .Rbuildignore with the not_for_cran folder for vignettes which, well, aren't for cran.) @jwasserman2 I'll leave it up to you to rename the polynomial one when you look into it.

jwasserman2 commented 1 year ago

Thanks for flagging these. I don't immediately have an idea what would cause those tests to error now, but I'll take a look.

Like @benthestatistician said, polynomialSimulation.Rmd doesn't work out of the box. It's written now expecting the data that generated the results to be cached, but we don't store the data in the repo (and we shouldn't). The simulations take a while to run as well, so we don't want it to be tested every time we commit to main. Since we don't cache the results and we don't want to run it often, I see polynomialSimulation more as documentation than testing. I'll look into the errors that are popping up right now and think** whether they're anything so serious that I think we should consider testing our lmrob methods more frequently.

** EDIT

benthestatistician commented 1 year ago

Thanks for these comments, @jwasserman2 . I agree about not checking in sim results, and also about not running polySim as part of the pkgdown build process.

That said, I also think it would be helpful for us to have a place in our overall QA process for longer-running scripts, of which this one could be a first example. Perhaps we could have a subdirectory somewhere for such scripts, and/or a Makefile to kick them off? I'd appreciate general suggestions and advice about setting such a thing up. (@josherrickson)

Getting back to polynomialSimulation specifically, Josh E is it your sense that we could write a short sequence of Make directives that together build the polynomialSimulation vignette from scratch? Or would that also require some more?

josherrickson commented 1 year ago

We can create a .Rbuildignore'd sub-folder, e.g. something like /scripts/, and stick them in there. Happy to do the modifications to the existing Makefile, or more reasonably, create a separate Makefile in that directory.

josherrickson commented 1 year ago

Re: polynomialSimulation, I don't have a sense of the issue. In general Makefiles can do anything that you can do at the terminal, so if there's a series of commands that will make it compile, I can get Make to handle it. I just don't know what those commands would be.

jwasserman2 commented 1 year ago

I think the polynomialSimulation failure is probably just due to this results caching issue. The other tests you had to comment out are unrelated and could be due to me misunderstanding what the expected behavior would be in those cases.

polynomialSimulation doesn't have tests to run in it right now; it just creates outputs that Ben and Adam visually validated against their paper. If we want to turn this into a test that can be run when desired, we can add in tests using those benchmarks from the paper.

benthestatistician commented 1 year ago

Re turning the polynomialSimulation script into an integration test sort of thing, I think the thing to do would be to add checks that empirical coverage is within the ballpark of nominal coverage. (As opposed to checking that it reproduces something that appeared in a published paper. The combination of methods now in the polySim script differs from what was in the paper.) Could you add this to your list @jwasserman2?

jwasserman2 commented 1 year ago

Yes. Do you have ideas for how close the ballpark should be?

benthestatistician commented 1 year ago

I'd start by abstracting a general lower confidence limit for coverages reported in the current version of the manuscript, something like min empirical coverage minus twice the square root of $(.95)(.05)/r_0$, where $r_0$ is the number of simulation replicates. Then I'd take off another two times the square root of $(.95)(.05)/r_1$, where $r_1$ is the number of replicates in the test simulation, and insist that empirical coverages be greater than this.

jwasserman2 commented 1 year ago

My updates from #68 were causing the non-polynomialSimulation failed tests. Previously, specifying type in a vcovDA call would specify the residual calculation to use in the meat matrices. Now, type specifies the variance function to use, so the meat matrix calculations were falling back on the sandwich defaults (which are "HC1" rather than the "HC0" I was specifying in the covadj_with_clustering.Rmd).

In light of this, for .vcovMB_CR1, the meat matrix calculations should be hardcoded to use type = "HC0" (though I realize this means the function should be named .vcovMB_CR0, following convention given, among other places, here.) I've still left the ... in .vcovMB_CR1 open for users to decide upon the meat matrix cadjust argument (bias correction by multiplying by G / (G - 1)), but if we would also like to hardcode that for users I can do that too.

benthestatistician commented 1 year ago

Am I right to infer that the sandwich default cadjust value led to less favorable coverage than no cadjust value at all, in this simulation? (Or is it rather that the intended combinations of argument caused some other form of breakage resulting in failed tests.)

jwasserman2 commented 1 year ago

@josherrickson Will it break things if I remove *.R from the .gitignore file in vignettes? I need to add the helper function files for polynomialSimulation.

@benthestatistician Setting cadjust = FALSE in these tests was for ease of validation. It just would have entailed multiplying my matrix multiplication result derived by hand by G / (G - 1) to compare it to sandwich's defaults. I do not have a sense of what coverage rates look like when you do or do not use the correction.

jwasserman2 commented 1 year ago

@josherrickson I just force pushed the necessary R scripts for the simulations and left the .gitignore alone

josherrickson commented 1 year ago

Can we just move the simulations to another folder so they're not a concern right now? E.g. /simulations?

jwasserman2 commented 1 year ago

I created a new folder within vignettes for polynomialSimulation and didn't update the makefile so that you can only build it manually. It achieves the same goal as your suggestion while still keeping it under vignettes

josherrickson commented 1 year ago

Great, as long as PR passes checks, pull it in and close this. Can start up a new issue if you and Ben need to discuss substantive content.

josherrickson commented 1 year ago

Ack, I forgot that part of what spawned this issue wasn't R's check, but rather building the reference site.

All .Rmd files inside /vignettes/ are built during the site creation, except those prefaced by an underscore.

I went ahead and renamed the simulation vignette, and updated the contribution guidelines.

benthestatistician commented 1 year ago

Thanks for your clarification above, Josh W. (And for the work on this PR, JW & JE both.)