ITensor / NDTensors.jl

A Julia package for n-dimensional sparse tensors.
Apache License 2.0
27 stars 7 forks source link

Simplify `_contract_scalar!` logic by using `LoopVectorization.jl` #83

Open mtfishman opened 3 years ago

mtfishman commented 3 years ago

Simplify the _contract_scalar! logic by using [LoopVectorization.jl]() instead of BLAS calls. I believe in general LoopVectorization.jl is as fast as MKL for any BLAS level 1 and 2 operations (and even BLAS level 3 for smaller matrices), and can be used very simply through the broadcast syntax like @avx Rᵈ .= α .* Tᵈ .+ Rᵈ instead of the ugly syntax BLAS.axpy!(α, Tᵈ, Rᵈ).

This should simplify that logic a lot, and allow us to merge the logic of _contract_scalar! and permutedims!!, which could also use LoopVectorization.jl for the trivial permutation branch. In principle, _contract_scalar! should be able to just call permutedims!!.

EDIT: Note that a main disadvantage of changing from BLAS calls to @avx calls is that they use different kinds of multithreading, so to get both matrix multiplications (which of course right now use BLAS.gemm!) and vectorized operations to use multithreading, a user would need to have both BLAS and Julia multithreading enabled. I have found that even if they are not nested, they can still cause problems with each other, so that needs to be investigated. Looking forward to a world where this is all in Julia!