Closed geoffroyleconte closed 1 year ago
All modified lines are covered by tests :white_check_mark:
Files | Coverage Δ | |
---|---|---|
src/LimitedLDLFactorizations.jl | 89.81% <100.00%> (+0.89%) |
:arrow_up: |
:loudspeaker: Thoughts on this report? Let us know!.
That's awesome! Thank you!
I tried lldl(A; memory=size(A,1))
to give an exact factorization, and then \
B does not give the same result as A \ B
. Did I mis-understand anything here?
Is the norm of the difference between the solution with lldl
and \
high? If so can you send me a basic working example so that I can reproduce the issue?
Ok the original PR gives accurate result after re-testing. However, I did a few tweaks to make it multi-threading, which surprisingly gives totally different result. See below:
function lldl_lsolve!(n, X::AbstractMatrix, Lp, Li, Lx)
Threads.@threads for j = 1:n
@inbounds for p = Lp[j]:(Lp[j + 1] - 1)
@simd for k ∈ axes(X, 2)
X[Li[p], k] -= Lx[p] * X[j, k]
end
end
end
return X
end
function lldl_dsolve!(n, X::AbstractMatrix, D)
Threads.@threads for j = 1:n
@simd for k ∈ axes(X, 2)
X[j, k] /= D[j]
end
end
return X
end
function lldl_ltsolve!(n, X::AbstractMatrix, Lp, Li, Lx)
Threads.@threads for j = n:-1:1
@inbounds for p = Lp[j]:(Lp[j + 1] - 1)
@simd for k ∈ axes(X, 2)
X[j, k] -= conj(Lx[p]) * X[Li[p], k]
end
end
end
return X
end
function lldl_solve!(n, B::AbstractMatrix, Lp, Li, Lx, D, P)
@views Y = B[P, :]
lldl_lsolve!(n, Y, Lp, Li, Lx)
lldl_dsolve!(n, Y, D)
lldl_ltsolve!(n, Y, Lp, Li, Lx)
return B
end
Any idea why it gives wrong result?
I think I found the reason. lldl_lsolve!
and lldl_ltsolve!
are not embarrassingly-parallelable. Any idea how to make it more thread-friendly?
Thanks @geoffroyleconte. There could also be methods for right-hand sides that are contiguous in memory, so that you can use x[j, :]
instead of a for
loop over the second axis.
@mzy2240 maybe you are modifying some elements in X
simultaneously, in which case you would need to use Threads.Atomic
?
I'm not an expert in this subject, @amontoison do you know what could be wrong here?
You should permute the for loops if you want to use Threads.@thread
. It's normal that you have a wrong result, you can't perform backward and forward sweeps in any order.
But you do a complete backward and forward sweep of the columns of B
on different threads.
@geoffroyleconte Can we merge this PR?
@geoffroyleconte Can you rebase your branch?
Thanks @geoffroyleconte!
Hello @mzy2240, does this fix your issue?
@dpo I copy-pasted some code that was in LDLFactorizations.jl for multiple rhs.