Closed ViralBShah closed 9 years ago
I did some tests for the dense \
and it performes well compared to MATLAB except for least squares. In #3298 I have changed the default least squares solver from an SVD based to a QR based and this brings the timings in line with MATLAB.
Is http://www.cise.ufl.edu/research/sparse/SuiteSparse/current/SuiteSparse/MATLAB_Tools/Factorize/Doc/factorize_demo.html covered by a license and, if so, what license? I haven't read it yet because of the possibility that it is GPL'd or LGPL'd.
It would be nice if the intelligence which decides which factorization to use is coded into a factorize
method instead of directly in \
. As far as I understand that is how Tim Davis's MATLAB factorize
package works. There are many cases where you would want to create a single factorization to later solve multiple right-hand-sides.
Comparing this to Octave (http://www.gnu.org/software/octave/doc/interpreter/Sparse-Linear-Algebra.html#Sparse-Linear-Algebra), I see there is only one place where they reference the properties of the RHS in deciding how to factorize and that is just whether the RHS is sparse or not.
Finally, it could be a nice feature to allow factorize
to accept options that describe properties of the matrix that the programmer might know. If such properties are given, then automatic determination /checking of properties may be skippped. See MATLAB's linsolve
, which works like this.
Yes, I think we will want to follow factorize
's lead and put the major intelligence into a reusable factorization, rather than a one-time operator.
@bsxfan I think most of your suggestions are already implemented in Julia but in different way that MATLAB. We already have factorizations which are used in the way you are suggesting. See e.g. the GLM package. If I know that the matrix A
is symmetric I can do
Hermitian(A)\b
and if I want to reuse a positivedefinite left hand side A
I can save it with F=cholfact(A)
and do F\b1
and F\b2
.
We could write a factorization
that chooses the best factorization based on checks similar to those in \
but the return type would depend on the input values, which generally avoided in Julia. Alternatively the factorization
function could dispatch on the input type like Hermitian
or Triangular
but maybe it is easier just to write exactly which factorization you want to use.
I think this is a case where we should violate Julia's constant return type convention. Requiring users to fully understand the factorizations to get efficient computation seems a total fail on our parts.
I agree, yes. I would vote for a generic factorize
that automatically returns the best factorization. Consider for example the case where one sometimes wants to send complex data---e.g. for checking derivatives with complex-step differentitaion---through a function that usually handles real data . If you forced a Cholesky factorization in your implementation of the function, the complex step usually just quietly gives the wrong answer---the complex matrix is not hermitian, but LAPACK does not notive this because of the very small complex values. But a generic factorize that checks ishermitian() would automatically choose Cholesky for real data and LU for complex data (the complex matrix would not be hermitian) and everything would just work.
I think this may be one of those places where it's ok for the return type to depend on the input values – the return type is part of the essential information that the function returns. We'll need to consider if we are ok with using the very generic name factorize
for this – which could, in particular, be mistaken for a function to factorize integers.
As @andreasnoackjensen said, we already have methods for factorizations. Whether one likes it or not, a lot of people are accustomed to doing \
and having it do the right thing, and not doing that is just going to lead to tons of issues being filed about julia being slow. I am not opposed to the idea of factorize
, for reusable blackbox factorizations and having \
just call that.
I'd be happy with a name that's more targeted -- something like linfactorize
.
@bsxfan I am still not convinced from your example with complex differentiation. Why wouldn't the matrix be symmetric when evaluated as a complex matrix? It it was, you would use the Bunch-Kaufman for both. However, I can see the convenience in having a factorize
function. Please have a look at the pull request.
@andreasnoackjensen I have indeed been looking at your factorize, which looks very nice at a first glance.
Sorry, my previous explanation may have been confusing. Let me give more details. A typical use case could look like this. In practice, you would have some real-valued rectangular matrix, say D
and you would be interested in the logdet of C=D*D.'
, as well as expressions of the form C\RHS
. This is typically to compute a multivariate Gaussian log-likelihood, say as part of a Gaussian process optimization. You would code your log-likelihood implementation as well as a hand-coded gradient for optimization. To check correctness of your derivative, you could use complex step differentiation.
For real D, C would usually be positive definite, so you might be tempted to hard-code cholesky into your log-likelihood implementation. But ,if you'e doing complex step differentiation, D would be complex, with very small imaginary values, so that C would end up being symmetric, but not hermitian. The complex step which just executes the same code, would then also do cholesky. Since LAPACK it is just accessing one triangle, it does not see that C is not hermitian and the cholesky returns with info==0 and your code will quietly continue to supply the wrong answer.
If instead, you originally called an intelligent factorize instead of cholesky, then different factorizations would be used for real and complex C and everything should work out. Still not implemented in your pull request is some additional wiring for logdet for some of the other factorications. I have just been playing around to supply LU with a logdet and it seems to be working out nicely (complex step is giving correct results). Here is what I did to wire up LU for logdet
and also to supply \
and /
, with optional argument transposes:
factorize(X::Matrix) = lufact(X) # a convenient default for tyre kicking
function At_ldiv_B{T<:BlasFloat}(A::LU{T}, B::StridedVecOrMat{T})
if A.info > 0; throw(SingularException(A.info)); end
LAPACK.getrs!('T', A.factors, A.ipiv, copy(B))
end
function At_ldiv_Bt{T<:BlasFloat}(A::LU{T}, B::StridedVecOrMat{T})
if A.info > 0; throw(SingularException(A.info)); end
LAPACK.getrs!('T', A.factors, A.ipiv, transpose(B))
end
function A_ldiv_Bt{T<:BlasFloat}(A::LU{T}, B::StridedVecOrMat{T})
if A.info > 0; throw(SingularException(A.info)); end
LAPACK.getrs!('N', A.factors, A.ipiv, transpose(B))
end
(/){T}(B::Matrix{T},A::LU{T}) = At_ldiv_Bt(A,B).'
function logdet2{T<:Real}(A::LU{T}) # return log(abs(det)) and sign(det)
m, n = size(A); if m!=n error("matrix must be square") end
dg = diag(A.factors)
s = (bool(sum(A.ipiv .!= 1:n) % 2) ? -one(T) : one(T)) * prod(sign(dg))
return sum(log(abs(dg))) , s
end
function logdet{T<:Real}(A::LU{T})
d,s = logdet2(A)
if s<0 throw(DomainError("determinant is negative")) end
return d
end
function logdet{T<:Complex}(A::LU{T})
m, n = size(A); if m!=n error("matrix must be square") end
s = sum(log(diag(A.factors))) + (bool(sum(A.ipiv .!= 1:n) % 2) ? complex(0,pi) : 0)
r,a = reim(s); a = a % 2pi; if a>pi a -=2pi elseif a<=-pi a+=2pi end
return complex(r,a)
end
logdet{T<:BlasFloat}(A::Matrix{T}) = logdet(factorize(A))
oh, OK my long reply was not lost, it is getting late thiss side of the atlantic ...
Thanks @andreasnoackjensen, your factorize looks like it will make my use-case for complex-step differentiation work out very nicely. I have just typed out a long explanation of my use-case, but lost it by navigating away before clicking commit --- aargh!
I could add that I would also need logdet
to be wired through factorize
. I have experimented in this direction by supplying LU with a logdet
(as well as some extended \
and /
capability). So far this makes my complex-step stuff work out nicely:
factorize(X::Matrix) = lufact(X) # general default for tyre kicking
function At_ldiv_B{T<:BlasFloat}(A::LU{T}, B::StridedVecOrMat{T})
if A.info > 0; throw(SingularException(A.info)); end
LAPACK.getrs!('T', A.factors, A.ipiv, copy(B))
end
function At_ldiv_Bt{T<:BlasFloat}(A::LU{T}, B::StridedVecOrMat{T})
if A.info > 0; throw(SingularException(A.info)); end
LAPACK.getrs!('T', A.factors, A.ipiv, transpose(B))
end
function A_ldiv_Bt{T<:BlasFloat}(A::LU{T}, B::StridedVecOrMat{T})
if A.info > 0; throw(SingularException(A.info)); end
LAPACK.getrs!('N', A.factors, A.ipiv, transpose(B))
end
(/){T}(B::Matrix{T},A::LU{T}) = At_ldiv_Bt(A,B).'
function logdet2{T<:Real}(A::LU{T}) # return log(abs(det)) and sign(det)
m, n = size(A); if m!=n error("matrix must be square") end
dg = diag(A.factors)
s = (bool(sum(A.ipiv .!= 1:n) % 2) ? -one(T) : one(T)) * prod(sign(dg))
return sum(log(abs(dg))) , s
end
function logdet{T<:Real}(A::LU{T})
d,s = logdet2(A)
if s<0 error("DomainError: determinant is negative") end
return d
end
function logdet{T<:Complex}(A::LU{T})
m, n = size(A); if m!=n error("matrix must be square") end
s = sum(log(diag(A.factors))) + (bool(sum(A.ipiv .!= 1:n) % 2) ? complex(0,pi) : 0)
r,a = reim(s); a = a % 2pi; if a>pi a -=2pi elseif a<=-pi a+=2pi end
return complex(r,a)
end
logdet{T<:BlasFloat}(A::Matrix{T}) = logdet(factorize(A))
This makes logdet applicable to more general arguments, while maintaining type stability.
@bsxfan Thank you for the explanation. I see your point now and have also managed to get complex numerical derivatives from my own likelihood function. I'll add your code to the pull request later today.
See: https://groups.google.com/forum/#!topic/julia-users/FagaqWVkBGU
Currently, factorize(S::SparseMatrixCSC)
does the following (bottom of cholmod.jl), which is too simplistic.
# placing factorize here for now. Maybe add a new file
function factorize(A::SparseMatrixCSC)
m, n = size(A)
if m == n
Ac = cholfact(A)
Ac.c.minor == m && ishermitian(A) && return Ac
end
return lufact(A)
end
Instead, we should be doing as Tim Davis suggested in the thread on the mailing list: http://faculty.cse.tamu.edu/davis/publications_files/Factorize_an_object_oriented_linear_system_solver_for_MATLAB.pdf
A quick skim of the cited paper turns up Section 3.2 (digital page 15), which can be translated into code of this form:
function factorize{T}(A::AbstractSparseArray{T,2})
n, m = size(A)
if n != m #A is rectangular
#Compute QR factorization of A
#TODO return qrfact of A or A', whichever is tall and thin.
return qrfact(A) #( n>m ? A, A')
end
#At this point A is square
if issym(A) #A is symmetric
if all(diag(A).!= 0) && all(imag(diag(A)).== 0)
#A has all-real nonzero diagonal
try
return cholfact(A)
catch e
isa(e, PosDefError) || raise(e)
end
end
# cholfact failed or condition on the diagonal does
# not hold; do LDLt
try
return ldltfact(A)
#catch some specific failure?
end
end
#Above conditions do not hold or chol and ldl failed
return lufact(A)
end
The specific implementations of qrfact
, cholfact
, ldltfact
and lufact
required are summarized in Table III, digital page 12:
Thanks @jiahao.
It is a good idea to follow Tim Davis' for the factorize(Sparse)
methods, but I'm not sure it is a good idea to use factorize
for Sparse\VecOrMat
. We are not using it for Dense\VecOrMat
for two reasons
factorize
is not type stable (for concrete types)Actually, I'm not sure why factorize
is used in Sparse\VecOrMat
right now. Going through the git history, I can see that I introduced the inferior factorize(Sparse)
method to make our ARPACK wrapper able to efficiently factorize pd matrices. It used to lufact
all matrices. However, Sparse\VecOrMat
must have had a kind of polyalgorithm before I added factorize(Sparse)
and I didn't touch any \
s at that commit.
Here is our current status:
lufact(::Sparse)
is defined and the computation uses UMFPACK, but the documentation of the factors F[:L]*F[:U] == F[:Rs] .* A[F[:p], F[:q]]
is no longer correct since .*
is now broken (Ref #8416, #9664; it was correct as of v0.2),cholfact(::Sparse)
computes either the LL' or LDL' form of Cholesky as determined by Cholmod, but F = cholfact(A); F[:L]
does not work,ldltfact(::Sparse)
is not defined. The diagonal LDL' is sometimes returned by Cholmod. As @tkelman wrote on julia-users, we do not have the Bunch-Kaufman form (where D is block-diagonal with 1x1 or 2x2 blocks) that is useful for symmetric indefinite matrices.qrfact(::Sparse)
is not defined.qrfact
still remains, but could be a separate issue.
I'm on that. Actually, I've already wrapped SPQR, but I have to write tests.
Wow, you are amazing!
We need a polyalgorithm for sparse .
This is the discussion from the mailing list. https://groups.google.com/forum/?fromgroups#!topic/julia-users/FagaqWVkBGU
We may need enhancements to the dense \ as well.
Cc: @dmbates, @andreasnoackjensen