cmu-delphi / epiprocess

Tools for basic signal processing in epidemiology
https://cmu-delphi.github.io/epiprocess/
Other
13 stars 9 forks source link

Consider performance improvements for `epix_slide`, `as_of` #76

Open brookslogan opened 2 years ago

brookslogan commented 2 years ago

Currently, epix_slide and as_of have reasonable performance for pseudoprospective analysis of some state-level forecasters, such as those predicting 23 quantiles using rq. However, for more efficient forecasters (using rq.fit, lm, or lm.fit) or simpler operations such as 7-day averaging, the overhead becomes noticeable or can even dwarf the actual operation's computation time. Some ideas for improvements:

Any performance improvements here should be planned together with considerations of alternatives to storing data in RAM. E.g., if the RAM storage is abandoned, then some of these potential improvements may not apply, or would need to be implemented in a different way. [They should also be coordinated with any move to S3+dtplyr; completing these changes in parallel would likely involve substantial merge conflicts.]

[Consider also tibble::as_tibble() -> using setDF in as_of, although may need checks that original DT or DT columns aren't aliased in exceptional circumstances.]

brookslogan commented 2 years ago

Tentatively marking this as P2, unless epipredict development reveals that it is a major issue.

nmdefries commented 5 months ago

Partially addressed in https://github.com/cmu-delphi/epiprocess/pull/386. Some additional performance ideas here and here.

dshemetov commented 2 months ago

Not sure where to post this, but this seems like a close enough issue. Coming from a discussion with @brookslogan @dsweber2. We're looking for a "sparse archive double slide" function. What we mean is constructing features for every as_of in an archive, but using a smarter approach than repeatedly calling epix_as_of and then applying epi_slide, then converting back to an archive and compacitfying. The faster approach would compactify as you go.