JuliaLinearAlgebra / Octavian.jl

Multi-threaded BLAS-like library that provides pure Julia matrix multiplication
https://julialinearalgebra.github.io/Octavian.jl/stable/
Other
226 stars 18 forks source link

No precomp #181

Closed chriselrod closed 1 year ago

chriselrod commented 1 year ago

Precompilation kills runtime performance because of Julia bugs. Master:

julia> @benchmark matmul!($Cf64, $Af64, $Bf64)
BenchmarkTools.Trial: 3727 samples with 1 evaluation.
 Range (min … max):  258.678 μs … 279.208 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     265.012 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   265.171 μs ±   1.905 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     ▁▂▄▄▆█▆▆▇▇▇█▇▇▃▄▃▁▁                         
  ▂▁▁▁▁▂▂▁▂▃▂▃▃▃▄▄▅▅▇████████████████████▆▆▅▄▅▄▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂ ▅
  259 μs           Histogram: frequency by time          272 μs <

 Memory estimate: 272 bytes, allocs estimate: 17.

PR:

julia> @benchmark matmul!($Cf64, $Af64, $Bf64)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  23.756 μs …  43.892 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     24.462 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   24.547 μs ± 586.526 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▁▃▅▆██▆▄▃▁                                           
  ▂▂▂▂▂▃▃▅▆██████████▇▅▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  23.8 μs         Histogram: frequency by time         26.7 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

@ranocha have you seen this?

This was the benchmark script I used:

julia> A = rand(-1_000:1_000, 200, 200);

julia> B = rand(-1_000:1_000, 200, 200);

julia> C = similar(A);

julia> @benchmark mul!($C, $A, $B)
BenchmarkTools.Trial: 411 samples with 1 evaluation.
 Range (min … max):  2.283 ms …  12.646 ms  ┊ GC (min … max): 0.00% … 77.02%
 Time  (median):     2.372 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.430 ms ± 515.259 μs  ┊ GC (mean ± σ):  0.98% ±  3.80%

         ██        ▁                                           
  ▅▂▁▁▁▂▃██▄▄▅▅▅▄▄▇█▅▃▄▆▄▃▂▃▁▂▁▂▁▁▂▁▂▃▁▁▁▃▁▂▂▂▃▃▃▃▂▁▂▂▁▄▅▃▁▃▂ ▃
  2.28 ms         Histogram: frequency by time        2.67 ms <

 Memory estimate: 30.77 KiB, allocs estimate: 4.

julia> @benchmark matmul!($C, $A, $B)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  90.987 μs … 118.231 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     91.633 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   92.750 μs ±   2.644 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▇█▅    ▂▄▃▁ ▃▆▅▁     ▁▁                                     ▂
  ████▇▃▄▆████▇████▁▃▁▆███▇▆▅▅▁▁▃▁▁▁▁▃▁▁▁▁▃▁▁▁▁▁▁▁▅▆▆▅▄▆▅▆▆▇▆▄ █
  91 μs         Histogram: log(frequency) by time       106 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> 2.369e-3 / 91.302e-6
25.946857681102276

julia> Af64 = Float64.(A); Bf64 = Float64.(B); Cf64 = similar(Af64);

julia> @benchmark mul!($Cf64, $Af64, $Bf64)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  14.808 μs … 34.542 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     15.224 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.367 μs ±  1.002 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▄█▆▂                                                      
  ▂▃▆████▆▄▃▂▂▂▁▁▂▂▂▂▂▁▂▁▁▁▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▁▂▂▂▂ ▃
  14.8 μs         Histogram: frequency by time        19.7 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark matmul!($Cf64, $Af64, $Bf64)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  23.756 μs …  43.892 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     24.462 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   24.547 μs ± 586.526 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▁▃▅▆██▆▄▃▁                                           
  ▂▂▂▂▂▃▃▅▆██████████▇▅▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  23.8 μs         Histogram: frequency by time         26.7 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> Cf64 == C
true
codecov[bot] commented 1 year ago

Codecov Report

Patch coverage has no change and project coverage change: -0.47 :warning:

Comparison is base (c63b5b9) 90.40% compared to head (9e0551e) 89.93%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #181 +/- ## ========================================== - Coverage 90.40% 89.93% -0.47% ========================================== Files 14 14 Lines 1146 1133 -13 ========================================== - Hits 1036 1019 -17 - Misses 110 114 +4 ``` | [Impacted Files](https://app.codecov.io/gh/JuliaLinearAlgebra/Octavian.jl/pull/181?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaLinearAlgebra) | Coverage Δ | | |---|---|---| | [src/Octavian.jl](https://app.codecov.io/gh/JuliaLinearAlgebra/Octavian.jl/pull/181?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaLinearAlgebra#diff-c3JjL09jdGF2aWFuLmps) | `50.00% <ø> (-50.00%)` | :arrow_down: | ... and [2 files with indirect coverage changes](https://app.codecov.io/gh/JuliaLinearAlgebra/Octavian.jl/pull/181/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaLinearAlgebra)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

ranocha commented 1 year ago

@ranocha have you seen this?

No - but I have to admit that I haven't been running a lot of matmul!-based code recently, focused on other parts of our codebase instead.

Thanks a lot for fixing this! Do you have an idea which Julia bugs are involved here?

chriselrod commented 1 year ago

Thanks a lot for fixing this! Do you have an idea which Julia bugs are involved here?

No, but bugs such as this have been known for a long time. TriangularSolve.jl had to disable precompilation as well for this reason.