performance hit in `.order_samples`

benbhansen-stats / propertee

Prognostic Regression Offsets with Propagation of ERrors, for Treatment Effect Estimation (IES R305D210029).

https://benbhansen-stats.github.io/propertee/

Other

2 stars 0 forks source link

performance hit in `.order_samples` #132

Closed josherrickson closed 1 year ago

josherrickson commented 1 year ago

I was noticing that the example given by @kirkvanacore in #131 is quite a bit slower than I anticipated, so I ran the code through a profiler. Almost the entire time (~95%) is being spent in these lines of .order_samples: https://github.com/benbhansen-stats/flexida/blob/a77c68444da435f0a9216c98a459130fb623e693/R/DirectAdjusted.R#L295C50-L295C52

No action at the moment, just flagging as a potential place to examine speed-ups in the future.

A few ideas for speeding up:

~~Avoid duplication of is.na and ==.~~
Drop down to Rcpp for subsetting (as I think that's the biggest issue)
Vectorize instead of loop? (vapply(unique(Q_C_ix[!is.na(Q_C_ix)]), ...?)

josherrickson commented 1 year ago

I quickly handled my first suggestion above and saw an increase of 40-50%. My other ideas are more involved so no immediate plans to try them,

benthestatistician commented 1 year ago

Thanks, @josherrickson @jwasserman2 , if you'd be comfortable working on this, do assign the issue to yourself (and turn attention to it when convenient to do so).

jwasserman2 commented 1 year ago

Thanks for speeding it up @josherrickson. There's a pretty solid suite of tests for that function in test.DirectAdjusted.R, so if tests pass when you make changes that should be a good indication things are still fine. I was pretty narrowly focused on making sure alignment works in all the cases that could arise at the time, there may be some places to speed it up. Maybe I could explain to Josh at some point what exactly is going on and he could help think of possibly better ways of coding this

nullsatz commented 5 months ago

@josherrickson, do you still have the script that you profiled? If so, can you point me to it? Then, I can profile it too and possibly implement suggestions 2 and 3 in the orignal issue.

josherrickson commented 5 months ago

I don’t - I used the code in the opening comment of #131 with the synthetic data posted a bit further in the thread.