The generic triangular solves (naivesub!) in 65937f43938e8b4d644f13931b7b11a9bf63fffe/base/linalg/triangular.jl#L765-L831 use row-major access. Given column-major storage is the norm in numerical linear algebra, should these methods instead use column-major access? To illustrate the performance impact, this code:
function colmajnaivesub!(A::LowerTriangular, b::AbstractVector, x::AbstractVector = b)
n = size(A, 2)
if !(n == length(b) == length(x))
throw(DimensionMismatch("second dimension of left hand side A, $n, length of output x, $(length(x)), and length of right hand side b, $(length(b)), must be equal"))
end
for j in 1:n
A[j,j] == zero(A[j,j]) && throw(SingularException(j))
xj = x[j] = A[j,j] \ b[j]
for i in j+1:n
b[i] -= A[i,j] * xj
end
end
x
end
m = 5000;
bo = rand(m); b = bo;
L = LowerTriangular(eye(m) + randn(m, m)/(2m));
using Benchmarks
Benchmarks.@benchmarkable(benchbase, copy!(b, bo), Base.LinAlg.naivesub!(L, b), nothing)
Benchmarks.@benchmarkable(benchcolmaj, copy!(b, bo), colmajnaivesub!(L, b), nothing)
println("Base meth"); Benchmarks.execute(benchbase)
println("Col-major"); Benchmarks.execute(benchcolmaj)
yields
Base meth
================ Benchmark Results ========================
Time per evaluation: 142.07 ms [139.38 ms, 144.77 ms]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
Memory allocated: 0.00 bytes
Number of allocations: 0 allocations
Number of samples: 68
Number of evaluations: 68
Time spent benchmarking: 9.89 s
Col-major
================ Benchmark Results ========================
Time per evaluation: 31.81 ms [30.55 ms, 33.06 ms]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
Memory allocated: 0.00 bytes
Number of allocations: 0 allocations
Number of samples: 100
Number of evaluations: 100
Time spent benchmarking: 3.26 s
I imagine I'm missing something important? If not and impressions are positive, I would be happy to prepare a PR covering the relevant methods. Perhaps eventually both column- and row-major storage could be handled efficiently via dispatch on an access-order trait or the like? Best!
The generic triangular solves (
naivesub!
) in 65937f43938e8b4d644f13931b7b11a9bf63fffe/base/linalg/triangular.jl#L765-L831 use row-major access. Given column-major storage is the norm in numerical linear algebra, should these methods instead use column-major access? To illustrate the performance impact, this code:yields
I imagine I'm missing something important? If not and impressions are positive, I would be happy to prepare a PR covering the relevant methods. Perhaps eventually both column- and row-major storage could be handled efficiently via dispatch on an access-order trait or the like? Best!