JuliaStats / GLM.jl

Generalized linear models in Julia
Other
593 stars 114 forks source link

safer int for binomial loglik #504

Closed palday closed 1 year ago

palday commented 1 year ago

closes #503

codecov-commenter commented 1 year ago

Codecov Report

Base: 87.32% // Head: 87.39% // Increases project coverage by +0.06% :tada:

Coverage data is based on head (4fc8539) compared to base (1459737). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #504 +/- ## ========================================== + Coverage 87.32% 87.39% +0.06% ========================================== Files 7 7 Lines 947 952 +5 ========================================== + Hits 827 832 +5 Misses 120 120 ``` | [Impacted Files](https://codecov.io/gh/JuliaStats/GLM.jl/pull/504?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats) | Coverage Δ | | |---|---|---| | [src/glmtools.jl](https://codecov.io/gh/JuliaStats/GLM.jl/pull/504/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats#diff-c3JjL2dsbXRvb2xzLmps) | `93.82% <100.00%> (+0.19%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

palday commented 1 year ago

@nalimilan

Here's the benchmark:


using BenchmarkTools
using GLM
using Random

n = 1_000_000
y = rand(MersenneTwister(42), [1, 2, 3], n) ./ 3
wts = fill(3, n)

m = glm(@formula(y ~ 1), (; y), Binomial(); wts)

@benchmark loglikelihood(m) samples=100 seconds=10

for current master:

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  78.540 ms … 96.700 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     81.484 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   81.709 ms ±  2.325 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▂▃  ▂  ▁██                                            
  ▃▁▃▁▃▄▄███▆█▄▄███▇▇▄▄▁▃▆▄▃▃▁▁▁▃▁▄▁▁▁▃▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃ ▃
  78.5 ms         Histogram: frequency by time        89.7 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

for this branch:

julia> @benchmark loglikelihood(m) samples=100 seconds=10
BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  83.523 ms … 91.340 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     85.213 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   85.559 ms ±  1.121 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▁▁▄▇█▅ ▁       ▇▂                                     
  ▃▁▁▃▅▁▁████████▅██▆▃▆▆███▁▃▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
  83.5 ms         Histogram: frequency by time        91.1 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

So 4-5ms extra on a million observations and generally within the variability of current master.

This was tested on a different machine than #503, here's the new version info:

Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × Intel(R) Xeon(R) E-2288G CPU @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 16 virtual cores