Closed miguelraz closed 3 years ago
After posting some questions in the Julia Slack #linearalgebra chat, Mason Protter and Chris Elrod kindly suggested some more idiomatic improvements, and the use of LoopVectorization.jl
for AVX + multithreaded speedups.
@chriselrod's code:
]add LoopVectorization@0.12
julia> using LoopVectorization
julia> @btime @avxt $Afloat32 .+= 1.0;
5.955 μs (0 allocations: 0 bytes)
julia> @btime @avxt $Afloat32 .+= 1f0;
6.063 μs (0 allocations: 0 bytes)
julia> @btime @avx thread=4 $Afloat32 .+= 1f0;
97.759 μs (0 allocations: 0 bytes)
julia> @btime @avx thread=false $Afloat32 .+= 1f0;
121.804 μs (0 allocations: 0 bytes)
With the following spec:
julia> versioninfo()
Julia Version 1.7.0-DEV.707
Commit d1d0646390* (2021-03-14 18:11 UTC)
Platform Info:
OS: Linux (x86_64-generic-linux)
CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, cascadelake)
Environment:
JULIA_NUM_THREADS = 36
Thanks for the contributions. I am overdue to improve Julia and am very happy to have more experienced people help with it.
This PR doesn't compile for me. Can you fix it?
JULIA % julia stencil.jl 10 4000 star 4
Parallel Research Kernels version
Julia stencil execution on 2D grid
Grid size = 4000
Radius of stencil = 4
Type of stencil = star
Data type = double precision
Compact representation of stencil loop body
Number of iterations = 10
ERROR: LoadError: MethodError: no method matching +(::Float64, ::Matrix{Float64})
For element-wise addition, use broadcasting with dot syntax: scalar .+ array
Closest candidates are:
+(::Any, ::Any, ::Any, ::Any...) at operators.jl:560
+(::Union{Float16, Float32, Float64}, ::BigFloat) at mpfr.jl:392
+(::Array, ::Array...) at arraymath.jl:43
...
Stacktrace:
[1] do_add!
@ ~/Work/PRK/JULIA/stencil.jl:67 [inlined]
[2] main()
@ Main ~/Work/PRK/JULIA/stencil.jl:212
[3] top-level scope
@ ~/Work/PRK/JULIA/stencil.jl:246
Ah, tried to be too cheeky with a function call. That should do it.
Do you certify that your contribution is made in good faith and does not attempt to introduce any negative behavior into this project?
I just saw this thread on twitter, so I might as well chip in a bit.
Changes:
do_add
->do_add!
: When a function mutates its arguments (A
in this case), it is Julia convention to add a!
at the end of the function name to alert the user.1:n
->Base.OneTo(n)
This method uses a type to tell the compiler that the loops start at1
, which is very useful for loop unrolling and SIMD shenanigans. Usually not worth it if it's not a super tight loop.@inbounds
elides bounds checking.1.0
toone(eltype(A))
: This calls a function which determines what is theidentity
element for the collection of A. This is especially useful to guarantee type stability of Julia code. This is especially relevant when doing floating point operations since1.0
is interpreted as aFloat64
, and therefore all the floats get widened toFloat64
, even if we passed in aFloat32
array. The timings below should make the difference very clear.TL;DR - I highly recommend reading the performance tips of the manual, almost nothing I did here is not written there. We gained about 15x and made our code type stable and generic - you can probably get this to run on GPUs without too much additional effort, but that's a PR for a different day.
My laptop specs: