Closed ojwoodford closed 1 year ago
It's far worse for sparse arrays:
using LinearSolve, SparseArrays, BenchmarkTools, LDLFactorizations
A = sprandn(5000, 100, 0.1)
b = randn(5000)
@assert isapprox(solve(LinearProblem(A, b)).u, ldl(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u;
@btime ldl(($A)' * $A) \ (($A)' * $b);
22.392 s (54 allocations: 20.41 MiB) 1.220 ms (47 allocations: 1.37 MiB)
LinearSolve is approximately 20,000x slower here!
I should add I'm on an Apple silicon Mac. Not sure how much difference that will make.
That’s a OK way to solve least squares only when we know A
has a low condition number a priori. It's not enough to test linear algebra algorithms using random matrices. Consider:
julia> using LinearSolve, LinearAlgebra
julia> vandermonde(xs, basis, m) = [basis(x, i) for x in xs,
i in 1:m]
vandermonde (generic function with 2 methods)
julia> V = vandermonde(range(0, 1, length=128), (x, i)->i==1 ? one(x) : x^(i-1), 20);
julia> b = rand(128);
julia> x_normal = (V'V) \ (V'b);
julia> x = solve(LinearProblem(V, b)).u;
julia> norm(V * x - b)
2.935841792541393
julia> norm(V * x_normal - b)
3.141337584566186
julia> norm(x - x_normal)
3.881181867110321e12
On my M1 mac:
julia> BLAS.set_num_threads(1);
julia> @btime qr($A)\$b;
1.365 ms (9 allocations: 847.33 KiB)
julia> @btime solve(LinearProblem($A, $b)).u;
2.607 ms (29 allocations: 1.66 MiB)
julia> BLAS.set_num_threads(4);
julia> @btime qr($A)\$b;
1.142 ms (9 allocations: 847.33 KiB)
julia> @btime solve(LinearProblem($A, $b)).u;
2.238 ms (29 allocations: 1.66 MiB)
So qr(A)\b
seems like a pretty big improvement over what solve(LinearProblem(A, b)).u
is doing?
It should be using a QR in this instance, but it has to runtime prove the non-square assumption. With the assumption it's likely fine.
2x overhead sounds like a hefty performance bug.
There's way to reduce it but it would mangle the code a bit and I haven't gotten around to it. But in general I'd hope cases that want performance set the assumptions.
In general I'd hope cases that want performance set the assumptions.
@ChrisRackauckas How can I tell LinearSolve that A is a priori well conditioned?
You don't need to know that it's well conditioned to use QR. You just need to know that the matrix is tall. That said, IMO LinearSolve should really check the shape and use QR if appropriate.
You don't need to know that it's well conditioned to use QR.
He wants to say it is well conditioned to use Symmetric((A)' * A) \ ((A)' * b)
.
I suggested qr
because it doesn't require A
to be well conditioned.
Personally, I think if someone has that kind of information about the problem (i.e. how well conditioned it is), they should just choose the appropriate way to solve the problem themselves.
Conditioning is expensive to calculate (more expensive than solving generally is, i.e. normally requires svd
[which could be used to solve]).
IMO LinearSolve should really check the shape and use QR if appropriate.
That's what it's doing, and that's exactly the problem.
He wants to say it is well conditioned to use Symmetric((A)' A) \ ((A)' b).
We should add a OperatorAssumption
for conditioning.
That's what it's doing, and that's exactly the problem.
How can a shape check take over 1 ms?
I think it is literally calling qr!
twice: first in init_cacheval
, and then again under a sub-solve
call.
Oh yeah we need to improve the cache init since there's no ArrayInterface.qr_instance(A)
implemented.
I suggested qr because it doesn't require A to be well conditioned.
I actually care about the case where A is well conditioned and sparse, and could be solved 4 orders of magnitude faster using normal equations.
Personally, I think if someone has that kind of information about the problem (i.e. how well conditioned it is), they should just choose the appropriate way to solve the problem themselves.
Currently that's exactly what I'm doing. It's just that LinearSolve claims to:
and I feel my experience shows it doesn't really deliver on that. The method should be at least as fas as QR for tall, sparse matrices. A normal solver could be added, as could a way to tell the system that A is well conditioned. I appreciate it's a work in progress, so please take this as constructive feedback rather than criticism.
I actually care about the case where A is well conditioned and sparse, and could be solved 4 orders of magnitude faster using normal equations.
This isn't true. QR will be fairly similar speed (2x for dense up to 10x for sparse) slower and a lot more accurate.
For the sparse example you gave, my laptop gives
julia> @btime qr(A) \ b;
11.515 ms (167 allocations: 23.39 MiB)
julia> @btime Symmetric(A' * A) \ (A' * b);
2.184 ms (79 allocations: 1.70 MiB)
@oscardssmith Fair point. I hadn't understood that QR was much faster for the sparse case. Full timings for me:
julia> @btime solve(LinearProblem($A, $b)).u;
22.228 s (54 allocations: 20.41 MiB)
julia> @btime qr($A) \ $b;
8.853 ms (159 allocations: 23.40 MiB)
julia> @btime Symmetric(($A)' * $A) \ (($A)' * $b);
1.288 ms (75 allocations: 1.70 MiB)
julia> @btime ldl(($A)' * $A) \ (($A)' * $b);
1.206 ms (47 allocations: 1.37 MiB)
So QR is already 2500x faster than what LinearSolve is doing, and accurate with poor conditioning (will test this on @YingboMa's example). This must surely be a bug in LinearSolve.
However, I do care about the further 8x speedup I get using my approach, and would prefer to use it from LinearSolve, and get all the benefits that LinearSolve claims to give.
So QR is already 2500x faster than what LinearSolve is doing, and accurate with poor conditioning (will test this on @YingboMa's example). This must surely be a bug in LinearSolve.
Yes, I've said multiple times here that in this case it's just doing a QR. So there is something odd here. In fact, the profile looks like it's some kind of weird inference bug or something.
That most be the oddest profile I've seen. It spends <1% of the time running code. The other 99% of the time is 🤷. Haven't seen this before, can someone double check the profile?
Haven't seen this before, can someone double check the profile?
I've seen things like this before. The fix is to start Julia with a single thread.
I profiled it earlier and saw qr!
calls from two different places within solve
.
Interesting. I found the issue. In https://github.com/SciML/LinearSolve.jl/pull/188 there was https://github.com/SciML/LinearSolve.jl/pull/188/commits/fd05fb1adf22b1dde8f0b892ac736fe0d4ba0134 which versioned qr!
to only be used for (at the time) Julia master on sparse arrays. And if you check:
using LinearSolve, SparseArrays, BenchmarkTools
A = sprandn(5000, 100, 0.1)
b = randn(5000)
prob = LinearProblem(A, b)
assumptions = LinearSolve.OperatorAssumptions(false)
@time solve(prob; assumptions)
using LinearAlgebra
@time qr!(A)
The new qr!
on sparsearrays is absurdly slow. So the fix is just to use the qr
instead.
@Wimmerer is this qr!
issue in sparsearrays known? I couldn't find an issue on it.
I profiled it earlier and saw qr! calls from two different places within solve.
That was already fixed.
using LinearSolve, SparseArrays, BenchmarkTools, LDLFactorizations, LinearAlgebra
A = sprandn(5000, 100, 0.1)
b = randn(5000)
@assert isapprox(solve(LinearProblem(A, b)).u, ldl(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u; # 16.984 ms (326 allocations: 24.16 MiB)
@btime qr($A) \ $b # 17.135 ms (159 allocations: 23.35 MiB)
@btime ldl(($A)' * $A) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
https://github.com/SciML/LinearSolve.jl/pull/290 adds the new algorithms for well-conditioned matrices and adds them to the default algorithm. Results:
using LinearSolve, LinearAlgebra, BenchmarkTools
A = randn(1000, 100)
b = randn(1000)
@assert isapprox(solve(LinearProblem(A, b)).u, Symmetric(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u; # 3.140 ms (26 allocations: 857.53 KiB)
@btime solve(LinearProblem($A, $b), $(NormalCholeskyFactorization())).u; # 411.000 μs (27 allocations: 163.36 KiB)
@btime solve(LinearProblem($A, $b), $(NormalBunchKaufmanFactorization())).u; # 400.500 μs (27 allocations: 210.09 KiB)
@btime solve(LinearProblem($A, $b), assumptions = $(OperatorAssumptions(false; condition = OperatorCondition.WellConditioned))).u; # 401.000 μs (18 allocations: 162.89 KiB)
@btime Symmetric(($A)' * $A) \ (($A)' * $b); # 400.000 μs (9 allocations: 209.09 KiB)
@btime factorize(Symmetric((A)' * A)) \ (($A)' * $b); # 404.300 μs (12 allocations: 209.19 KiB)
using LinearSolve, SparseArrays, BenchmarkTools, LDLFactorizations, LinearAlgebra
A = sprandn(5000, 100, 0.1)
b = randn(5000)
@assert isapprox(solve(LinearProblem(A, b)).u, ldl(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u; # 16.984 ms (326 allocations: 24.16 MiB)
@btime solve(LinearProblem($A, $b), $(NormalCholeskyFactorization())).u; # 1.581 ms (130 allocations: 1.70 MiB)
@btime solve(LinearProblem($A, $b), assumptions = $(OperatorAssumptions(false; condition = OperatorCondition.WellConditioned))).u; # 1.535 ms (124 allocations: 1.70 MiB)
@btime qr($A) \ $b # 17.135 ms (159 allocations: 23.35 MiB)
@btime factorize(Symmetric((A)' * A)) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
@btime cholesky(Symmetric((A)' * A)) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
@btime ldlt(((A)' * A)) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
@btime ldl(($A)' * $A) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
I'm not entirely sure where the overhead on the sparse case is coming from, but at least it's rather close.
Solving the normal equations is much faster for tall linear systems:
Is there a reason why LinearSolve's smart algorithm selection doesn't choose to do this?