JuliaLinearAlgebra / Octavian.jl

Multi-threaded BLAS-like library that provides pure Julia matrix multiplication
https://julialinearalgebra.github.io/Octavian.jl/stable/
Other
230 stars 18 forks source link

bus error on Xeon Gold processor #124

Closed jlchan closed 2 years ago

jlchan commented 2 years ago

@sloede noticed this error when running Trixi.jl on a Xeon Gold processor

MWE:

using StructArrays, StaticArrays, Octavian

x = StructArray{SVector{5,Float64}}(ntuple(_ -> randn(2744, 512), 5))
out = similar(x)
A = randn(size(x,1), size(x,1))
StructArrays.foreachfield((out,x) -> matmul!(out, A, x), out, x)

The output is

signal (7): Bus error
in expression starting at /zhome/academic/HLRS/hlrs/hpcschlo/paper-ec-performance/2021_EC_performance/code/Gauss/mwe/mwe.jl:10
macro expansion at /zhome/academic/HLRS/hlrs/hpcschlo/.julia/packages/VectorizationBase/xHOp9/src/llvm_intrin/vbroadcast.jl:74 [inlined]
_vbroadcast at /zhome/academic/HLRS/hlrs/hpcschlo/.julia/packages/VectorizationBase/xHOp9/src/llvm_intrin/vbroadcast.jl:80 [inlined]
vbroadcast at /zhome/academic/HLRS/hlrs/hpcschlo/.julia/packages/VectorizationBase/xHOp9/src/llvm_intrin/vbroadcast.jl:96 [inlined]
macro expansion at /zhome/academic/HLRS/hlrs/hpcschlo/.julia/packages/LoopVectorization/x4G96/src/reconstruct_loopset.jl:713 [inlined]
_turbo_! at /zhome/academic/HLRS/hlrs/hpcschlo/.julia/packages/LoopVectorization/x4G96/src/reconstruct_loopset.jl:713 [inlined]
ploopmul! at /zhome/academic/HLRS/hlrs/hpcschlo/.julia/packages/Octavian/Rlmrt/src/macrokernels.jl:30 [inlined]
packaloopmul! at /zhome/academic/HLRS/hlrs/hpcschlo/.julia/packages/Octavian/Rlmrt/src/macrokernels.jl:138
unknown function (ip: 0x40060696dbd77cf5)
Allocations: 63871354 (Pool: 63824640; Big: 46714); GC: 30
Bus error

The MWE was run with julia --threads=1 mwe.jl on an Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz with [6fd5a793] + Octavian v0.3.8, [90137ffa] + StaticArrays v1.2.13, and [09ab397b] + StructArrays v0.6.3.

Unfortunately, it seems like we can only produce this error on the cluster; our personal laptops and Linux desktops don't seem to have an issue.

chriselrod commented 2 years ago

I can reproduce:

julia> StructArrays.foreachfield((out,x) -> matmul!(out, A, x), out, x)

signal (7): Bus error
in expression starting at REPL[5]:1
macro expansion at /home/chriselrod/.julia/dev/VectorizationBase/src/llvm_intrin/vbroadcast.jl:74 [inlined]
_vbroadcast at /home/chriselrod/.julia/dev/VectorizationBase/src/llvm_intrin/vbroadcast.jl:80 [inlined]
vbroadcast at /home/chriselrod/.julia/dev/VectorizationBase/src/llvm_intrin/vbroadcast.jl:96 [inlined]
macro expansion at /home/chriselrod/.julia/dev/LoopVectorization/src/reconstruct_loopset.jl:713 [inlined]
_turbo_! at /home/chriselrod/.julia/dev/LoopVectorization/src/reconstruct_loopset.jl:713 [inlined]
ploopmul! at /home/chriselrod/.julia/dev/Octavian/src/macrokernels.jl:30 [inlined]
packaloopmul! at /home/chriselrod/.julia/dev/Octavian/src/macrokernels.jl:138
unknown function (ip: 0xbfdfa82222d17b41)
Allocations: 75050282 (Pool: 74990981; Big: 59301); GC: 46
fish: Job 1, '/home/chriselrod/Documents/lang…' terminated by signal SIGBUS (Misaligned address error)
chriselrod commented 2 years ago

I can reproduce on skylake-avx512, but not on tigerlake. Both have AVX512. LoopVectorization should be treating them identically here, Octavian nearly so (cache sizes differ).

sloede commented 2 years ago

FWIW, I was able to reproduce my initial error on Xeon Gold 6138 (Skylake), 6230 (Cascade Lake), and 6248 (also Cascade Lake).