New benchmark: SnpLinAlg is now 2x faster, still about 2x slower than SnpBitMatrix. Still not fast enough to replace or deprecate SnpBitMatrix, but became a better memory-efficient option.
SnpArray linear algebra with LoopVectorization and CUDA
versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
JULIA_CUDA_USE_BINARYBUILDER = false
ENV["JULIA_CUDA_USE_BINARYBUILDER"] = "false"
using CUDA
EUR_100_cu = CuSnpArray{Float64}(EUR_100; model=ADDITIVE_MODEL, center=false, scale=false);
┌ Warning: `haskey(::TargetIterator, name::String)` is deprecated, use `Target(; name = name) !== nothing` instead.
│ caller = llvm_compat(::VersionNumber) at compatibility.jl:176
└ @ CUDA /home/kose/.julia/packages/CUDA/5t6R9/deps/compatibility.jl:176
Coverage decreased (-0.06%) to 81.132% when pulling 9561b8fab1e91f48fd1f2d291a56989e546a89d1 on kose-y:master into 41d0d323850c9c5d310ce05374b6bfd523355051 on OpenMendel:master.
Coverage decreased (-0.06%) to 81.132% when pulling fe898785f357c5cd570d4bba82c452adf3c6a47f on kose-y:master into 41d0d323850c9c5d310ce05374b6bfd523355051 on OpenMendel:master.
New benchmark: SnpLinAlg is now 2x faster, still about 2x slower than SnpBitMatrix. Still not fast enough to replace or deprecate SnpBitMatrix, but became a better memory-efficient option.
SnpArray linear algebra with LoopVectorization and CUDA
Let's try with EUR data repeated 100 times: 37900 by 54051.
$y = Ax$
Direct linear algebra on a SnpArray:
The below is the benchmark for SnpBitMatrix:
Let's try CUDA. The device is Nvidia Titan V.
Moving data to GPU:
The speedup is obvious. Let's check correctness:
$A^T x$