LoopVectorization fails on some operations?

alanderos91 commented 1 year ago

Reproducing the issue:

using SnpArrays
mouse = SnpArray(SnpArrays.datadir("mouse.bed"))
A = SnpLinAlg{Float64}(mouse)
A*A'

This results in the following message:

┌ Warning: #= /home/alanderos/.julia/packages/SnpArrays/WBzpL/src/linalg_direct.jl:760 =#:
│ `LoopVectorization.check_args` on your inputs failed; running fallback `@inbounds @fastmath` loop instead.
│ Use `warn_check_args=false`, e.g. `@turbo warn_check_args=false ...`, to disable this warning.
└ @ SnpArrays ~/.julia/packages/LoopVectorization/FMfT8/src/condense_loopset.jl:1049

This happens when I start Julia with 10 threads on a 10 core machine. Here's some additional information:

Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-10900KF CPU @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)

Note: The issue came up trying to compute A*B where A is SnpLinAlg and B is SparseMatrixCSC. So other operations seem to run into the issue.

biona001 commented 1 year ago

Sorry, probably my fault. At least the answer seems correct.

For matrix X, I only tested A*X where A is SnpLinAlg and X is dense matrix. Not sure why it fails when X is another SnpLinAlg or SparseMatrixCSC, would need some time to figure this out

kose-y commented 1 year ago

I don't think it's anybody's fault, Ben. It's just that we have not designed a kernel for this specific operation. LoopVectorization is not really a good solution for SparseMatrixCSC. For SnpLinAlg x SnpLinAlg, the quick and dirty workaround is indeed transforming one of them into a numeric matrix. If it is large enough, I think it might still be faster than numeric matrix x numeric matrix due to having slightly better cache efficiency.

alanderos91 commented 1 year ago

Thank you both for the quick reply. I only used SnpLinAlg x SnpLinAlg as an example since it throws the same warning and is incredibly slow, for me at least.

In the short-term, I think it would be reasonable to restrict this and this to DenseMatrix{T} instead of AbstractMatrix{T}. Multiplication with a generic AbstractMatrix{T} should either throw an error, as it is technically not supported, or give a warning, since the current kernel is not suitable for that case.

In the long-term it would be nice to round out support for matrix-matrix multiplication. I can probably implement something for SnpLinAlg times SparseMatrixCSC for my problem and contribute it as a starting point. Can't promise it will be the best, though.

OpenMendel / SnpArrays.jl

LoopVectorization fails on some operations? #121