dzhang314 / MultiFloats.jl

Fast, SIMD-accelerated extended-precision arithmetic for Julia
MIT License
77 stars 10 forks source link

fix runtime dispatch for MultiFloat{Float64, 11} and up #36

Closed nsajko closed 1 year ago

nsajko commented 1 year ago

The ntuple function has a form taking a Val argument, instead of an Int argument. Using it prevents runtime dispatch in basic arithmetic functions for MultiFloat{Float64, m} for m>=11.

State on master (0f15f4c2b869bfc4e7) before patch:

$ /home/nsajko/tmp/julia-61afe7b7f0/bin/julia -O3 --min-optlevel=3
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0-DEV.158 (2022-12-20)
 _/ |\__'_|_|_|\__'_|  |  Commit 61afe7b7f08 (0 days old master)
|__/                   |

julia> using MultiFloats, JET, Test

julia> const m = 11
11

julia> MultiFloats.use_standard_multifloat_arithmetic(m)

julia> const F = MultiFloat{Float64, n} where {n}
Float64x (alias for MultiFloat{Float64})

julia> (a, b) = F{m}.((0.5, 3))
(0.5, 3.0)

julia> (a+b, -b, a-b, a*b, a/b);

julia> @report_opt a/b
═════ 1 possible error found ═════
┌ @ /home/nsajko/tmp/MultiFloats.jl/src/MultiFloats.jl:381 MultiFloats.multifloat_div(x, y)
│┌ @ none:0 r = a MultiFloats.:- b MultiFloats.:* q0
││┌ @ /home/nsajko/tmp/MultiFloats.jl/src/MultiFloats.jl:392 MultiFloats.:-(y)
│││┌ @ /home/nsajko/tmp/MultiFloats.jl/src/MultiFloats.jl:391 (%2)
││││ runtime dispatch detected: ::Float64x{11}(%2::Tuple{Vararg{Float64}})::Float64x{11}
│││└──────────────────────────────────────────────────────────

julia> @report_opt a-b
═════ 1 possible error found ═════
┌ @ /home/nsajko/tmp/MultiFloats.jl/src/MultiFloats.jl:392 MultiFloats.:-(y)
│┌ @ /home/nsajko/tmp/MultiFloats.jl/src/MultiFloats.jl:391 (%2)
││ runtime dispatch detected: ::Float64x{11}(%2::Tuple{Vararg{Float64}})::Float64x{11}
│└──────────────────────────────────────────────────────────

julia> @report_opt -b
═════ 1 possible error found ═════
┌ @ /home/nsajko/tmp/MultiFloats.jl/src/MultiFloats.jl:391 (%2)
│ runtime dispatch detected: ::Float64x{11}(%2::Tuple{Vararg{Float64}})::Float64x{11}
└──────────────────────────────────────────────────────────

State after patch:

julia> @report_opt a/b
No errors detected

julia> @report_opt a-b
No errors detected

julia> @report_opt -b
No errors detected
dzhang314 commented 1 year ago

Hey @nsajko, thanks for contributing this fix! I was unaware of the Val form of ntuple (perhaps it was not available back in 2019 when I started working on MultiFloats.jl) and I agree that it is not only faster but conceptually clearer, by communicating that N is known statically.

As an unrelated note, you probably already realize this, but I do want to point out that I do not recommend use of Float64x{11} and beyond. Past Float64x8, the limited exponent range of Float64 starts to become a serious issue, as lower limbs become truncated to zero. Also, I haven't benchmarked this recently, but at least on 2019 hardware, the cubic runtime of the Float64x{N} arithmetic algorithms becomes worse than simply using BigFloat right around N == 8.