benbhansen-stats / propertee

Prognostic Regression Offsets with Propagation of ERrors, for Treatment Effect Estimation (IES R305D210029).
https://benbhansen-stats.github.io/propertee/
Other
2 stars 0 forks source link

profiling teeMod #175

Open nullsatz opened 1 month ago

nullsatz commented 1 month ago

Today, I learned how to use RStudio, including how to profile a testthat file inside RStudio, namely tests/testthat/test.teeMod.R

The goal is to find areas in the relevant code that could be made more time efficient and work on those.

Here is a first step toward identifying good candidates.

Image 5-28-24 at 5 11 PM

When I start making changes to the code, I will put the changes in a new branch, push the branch to GitHub, and post the name of the new branch here.

josherrickson commented 1 month ago

Love this. #132 might be a good place to look too, both for other places for speedups, and for a larger data that'll run slower.

benthestatistician commented 1 month ago

Thanks, @nullsatz . I agree with @josherrickson's suggestion on where to look for likely speedup opportunities. That said, I'm not sure there's anything in need of fixing as of yet. If you do find what looks like a bottleneck, in .order_samples() or elsewhere, I appreciate your attention on it, but if not, we might wait for the actual bottlenecks to turn up.

benthestatistician commented 3 weeks ago

Hi @nullsatz , did you identify candidates functions for fine-tuning? Did you decide to take on any of those projects? It'd be useful to hear here both what you chose to work on (if anything) and what you may have evaluated as a potential candidate for TLC but didn't decide to work on.

nullsatz commented 2 weeks ago

I guess this issue has already been investigated by @josherrickson in a previous issue. I will get the original script that was tested as a benchmark and work on his original 2nd and 3rd suggestions.

nullsatz commented 1 week ago

I profiled the following code starting at the call to rd_design.

library("propertee")

ad <- read.csv("synth_dat_issue131.csv")

des <- rd_design(Z ~ forcing(R) + unitid(id) + block(problem_id),
                 data=ad[ad$R > -1 & ad$R < 11, ])

m1_bw2<-glm(Y ~ R + Z, data = ad[ad$R > -1 & ad$R < 11, ], family = binomial)

res_BW2_1 <- lmitt(Y~1,design=des,offset=cov_adj(m1_bw2), weights = "ate",
                   data=ad[ad$R > -1 & ad$R < 11, ])

summary(res_BW2_1)

The resulting flame graph

Screenshot 2024-06-25 at 8 28 34 AM

shows that summary.teeMod is taking the most time.

If I profile just the line summary(res_BW2_1), I get a new flame graph.

Screenshot 2024-06-25 at 8 38 18 AM

These graphs suggest to me that .order_samples is a good candidate for speedup as reported in the previous issue.