JuliaStats / Survival.jl

Survival analysis in Julia
MIT License
73 stars 22 forks source link

Simplifications and performance improvements for estimators #43

Closed ararslan closed 2 years ago

ararslan commented 2 years ago

We can very cheaply determine ahead of time the correct length for the output arrays stored by the estimator types, which allows us to set indices directly rather than repeatedly growing the arrays. This provides a decent improvement in performance and memory use.

Current master:

julia> @benchmark fit(KaplanMeier, $t, $s)
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
 Range (min … max):  7.563 μs … 242.163 μs  ┊ GC (min … max): 0.00% … 93.11%
 Time  (median):     8.176 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.810 μs ±   7.397 μs  ┊ GC (mean ± σ):  2.74% ±  3.20%

  ▃▃▅▆█▅▂▁        ▁    ▁▁                                     ▁
  ████████████▇▇█▇██▇█▇██▇▅▅▅▅▄▄▄▅▅▄▅▄▄▃▄▅▃▄▄▄▅▄▆▅▅▆▅▅▅▄▅▅▅▇█ █
  7.56 μs      Histogram: log(frequency) by time      15.8 μs <

 Memory estimate: 14.05 KiB, allocs estimate: 30.

This PR:

julia> @benchmark fit(KaplanMeier, $t, $s)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  2.875 μs … 66.797 μs  ┊ GC (min … max): 0.00% … 86.05%
 Time  (median):     3.074 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.267 μs ±  2.090 μs  ┊ GC (mean ± σ):  2.19% ±  3.32%

    ▃▆███▇▅▄▃▂▂▂▁                         ▁▁▂▂▂▁             ▂
  ▅▆██████████████▇▇▆▆▅▅▃▅▅▅▄▄▃▅▁▃▆▇█████████████▇▆▅▁▃▃▅▆▆▆▆ █
  2.87 μs      Histogram: log(frequency) by time     4.79 μs <

 Memory estimate: 8.75 KiB, allocs estimate: 10.

Here, t and s are defined as in the tests for KaplanMeier.

I've also taken care of a 5-year-old to-do item and unified the fit methods to favor using EventTimes rather than separate arrays. This simplifies things a bit.

Lastly, I fixed similarly long-standing docstring typos (I meant "denote," not "dictate") for the fit methods and added docstrings for the methods that take a single vector of EventTimes.

codecov-commenter commented 2 years ago

Codecov Report

Merging #43 (dc63baf) into master (43886ce) will increase coverage by 0.01%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #43      +/-   ##
==========================================
+ Coverage   99.54%   99.56%   +0.01%     
==========================================
  Files           5        5              
  Lines         220      228       +8     
==========================================
+ Hits          219      227       +8     
  Misses          1        1              
Impacted Files Coverage Δ
src/kaplanmeier.jl 100.00% <ø> (ø)
src/nelsonaalen.jl 100.00% <ø> (ø)
src/estimator.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 43886ce...dc63baf. Read the comment docs.