JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.41k stars 5.45k forks source link

sparse \ polyalgorithm #3295

Closed ViralBShah closed 9 years ago

ViralBShah commented 11 years ago

We need a polyalgorithm for sparse .

This is the discussion from the mailing list. https://groups.google.com/forum/?fromgroups#!topic/julia-users/FagaqWVkBGU

We may need enhancements to the dense \ as well.

Cc: @dmbates, @andreasnoackjensen

andreasnoack commented 11 years ago

I did some tests for the dense \ and it performes well compared to MATLAB except for least squares. In #3298 I have changed the default least squares solver from an SVD based to a QR based and this brings the timings in line with MATLAB.

dmbates commented 11 years ago

Is http://www.cise.ufl.edu/research/sparse/SuiteSparse/current/SuiteSparse/MATLAB_Tools/Factorize/Doc/factorize_demo.html covered by a license and, if so, what license? I haven't read it yet because of the possibility that it is GPL'd or LGPL'd.

bsxfan commented 11 years ago

It would be nice if the intelligence which decides which factorization to use is coded into a factorize method instead of directly in \. As far as I understand that is how Tim Davis's MATLAB factorize package works. There are many cases where you would want to create a single factorization to later solve multiple right-hand-sides.

Comparing this to Octave (http://www.gnu.org/software/octave/doc/interpreter/Sparse-Linear-Algebra.html#Sparse-Linear-Algebra), I see there is only one place where they reference the properties of the RHS in deciding how to factorize and that is just whether the RHS is sparse or not.

Finally, it could be a nice feature to allow factorize to accept options that describe properties of the matrix that the programmer might know. If such properties are given, then automatic determination /checking of properties may be skippped. See MATLAB's linsolve, which works like this.

johnmyleswhite commented 11 years ago

Yes, I think we will want to follow factorize's lead and put the major intelligence into a reusable factorization, rather than a one-time operator.

andreasnoack commented 11 years ago

@bsxfan I think most of your suggestions are already implemented in Julia but in different way that MATLAB. We already have factorizations which are used in the way you are suggesting. See e.g. the GLM package. If I know that the matrix A is symmetric I can do

Hermitian(A)\b

and if I want to reuse a positivedefinite left hand side A I can save it with F=cholfact(A) and do F\b1 and F\b2.

We could write a factorization that chooses the best factorization based on checks similar to those in \ but the return type would depend on the input values, which generally avoided in Julia. Alternatively the factorization function could dispatch on the input type like Hermitian or Triangular but maybe it is easier just to write exactly which factorization you want to use.

johnmyleswhite commented 11 years ago

I think this is a case where we should violate Julia's constant return type convention. Requiring users to fully understand the factorizations to get efficient computation seems a total fail on our parts.

bsxfan commented 11 years ago

I agree, yes. I would vote for a generic factorize that automatically returns the best factorization. Consider for example the case where one sometimes wants to send complex data---e.g. for checking derivatives with complex-step differentitaion---through a function that usually handles real data . If you forced a Cholesky factorization in your implementation of the function, the complex step usually just quietly gives the wrong answer---the complex matrix is not hermitian, but LAPACK does not notive this because of the very small complex values. But a generic factorize that checks ishermitian() would automatically choose Cholesky for real data and LU for complex data (the complex matrix would not be hermitian) and everything would just work.

StefanKarpinski commented 11 years ago

I think this may be one of those places where it's ok for the return type to depend on the input values – the return type is part of the essential information that the function returns. We'll need to consider if we are ok with using the very generic name factorize for this – which could, in particular, be mistaken for a function to factorize integers.

ViralBShah commented 11 years ago

As @andreasnoackjensen said, we already have methods for factorizations. Whether one likes it or not, a lot of people are accustomed to doing \ and having it do the right thing, and not doing that is just going to lead to tons of issues being filed about julia being slow. I am not opposed to the idea of factorize, for reusable blackbox factorizations and having \ just call that.

johnmyleswhite commented 11 years ago

I'd be happy with a name that's more targeted -- something like linfactorize.

andreasnoack commented 11 years ago

@bsxfan I am still not convinced from your example with complex differentiation. Why wouldn't the matrix be symmetric when evaluated as a complex matrix? It it was, you would use the Bunch-Kaufman for both. However, I can see the convenience in having a factorize function. Please have a look at the pull request.

bsxfan commented 11 years ago

@andreasnoackjensen I have indeed been looking at your factorize, which looks very nice at a first glance.

Sorry, my previous explanation may have been confusing. Let me give more details. A typical use case could look like this. In practice, you would have some real-valued rectangular matrix, say D and you would be interested in the logdet of C=D*D.', as well as expressions of the form C\RHS. This is typically to compute a multivariate Gaussian log-likelihood, say as part of a Gaussian process optimization. You would code your log-likelihood implementation as well as a hand-coded gradient for optimization. To check correctness of your derivative, you could use complex step differentiation.

For real D, C would usually be positive definite, so you might be tempted to hard-code cholesky into your log-likelihood implementation. But ,if you'e doing complex step differentiation, D would be complex, with very small imaginary values, so that C would end up being symmetric, but not hermitian. The complex step which just executes the same code, would then also do cholesky. Since LAPACK it is just accessing one triangle, it does not see that C is not hermitian and the cholesky returns with info==0 and your code will quietly continue to supply the wrong answer.

If instead, you originally called an intelligent factorize instead of cholesky, then different factorizations would be used for real and complex C and everything should work out. Still not implemented in your pull request is some additional wiring for logdet for some of the other factorications. I have just been playing around to supply LU with a logdet and it seems to be working out nicely (complex step is giving correct results). Here is what I did to wire up LU for logdet and also to supply \ and /, with optional argument transposes:

factorize(X::Matrix) = lufact(X) # a convenient default for tyre kicking

function At_ldiv_B{T<:BlasFloat}(A::LU{T}, B::StridedVecOrMat{T})
    if A.info > 0; throw(SingularException(A.info)); end
    LAPACK.getrs!('T', A.factors, A.ipiv, copy(B))
end

function At_ldiv_Bt{T<:BlasFloat}(A::LU{T}, B::StridedVecOrMat{T})
    if A.info > 0; throw(SingularException(A.info)); end
    LAPACK.getrs!('T', A.factors, A.ipiv, transpose(B))
end

function A_ldiv_Bt{T<:BlasFloat}(A::LU{T}, B::StridedVecOrMat{T})
    if A.info > 0; throw(SingularException(A.info)); end
    LAPACK.getrs!('N', A.factors, A.ipiv, transpose(B))
end

(/){T}(B::Matrix{T},A::LU{T}) = At_ldiv_Bt(A,B).'

function logdet2{T<:Real}(A::LU{T})  # return log(abs(det)) and sign(det)
    m, n = size(A); if m!=n error("matrix must be square") end
    dg = diag(A.factors)
    s = (bool(sum(A.ipiv .!= 1:n) % 2) ? -one(T) : one(T)) * prod(sign(dg))
    return sum(log(abs(dg))) , s 
end

function logdet{T<:Real}(A::LU{T})
    d,s = logdet2(A)
    if s<0 throw(DomainError("determinant is negative")) end
    return d
end

function logdet{T<:Complex}(A::LU{T})
    m, n = size(A); if m!=n error("matrix must be square") end
    s = sum(log(diag(A.factors))) + (bool(sum(A.ipiv .!= 1:n) % 2) ? complex(0,pi) : 0) 
    r,a = reim(s);  a = a % 2pi; if a>pi a -=2pi elseif a<=-pi a+=2pi end
    return complex(r,a)    
end

logdet{T<:BlasFloat}(A::Matrix{T}) = logdet(factorize(A))
bsxfan commented 11 years ago

oh, OK my long reply was not lost, it is getting late thiss side of the atlantic ...

Thanks @andreasnoackjensen, your factorize looks like it will make my use-case for complex-step differentiation work out very nicely. I have just typed out a long explanation of my use-case, but lost it by navigating away before clicking commit --- aargh!

I could add that I would also need logdet to be wired through factorize. I have experimented in this direction by supplying LU with a logdet(as well as some extended \ and /capability). So far this makes my complex-step stuff work out nicely:

factorize(X::Matrix) = lufact(X) # general default for tyre kicking

function At_ldiv_B{T<:BlasFloat}(A::LU{T}, B::StridedVecOrMat{T})
    if A.info > 0; throw(SingularException(A.info)); end
    LAPACK.getrs!('T', A.factors, A.ipiv, copy(B))
end

function At_ldiv_Bt{T<:BlasFloat}(A::LU{T}, B::StridedVecOrMat{T})
    if A.info > 0; throw(SingularException(A.info)); end
    LAPACK.getrs!('T', A.factors, A.ipiv, transpose(B))
end

function A_ldiv_Bt{T<:BlasFloat}(A::LU{T}, B::StridedVecOrMat{T})
    if A.info > 0; throw(SingularException(A.info)); end
    LAPACK.getrs!('N', A.factors, A.ipiv, transpose(B))
end

(/){T}(B::Matrix{T},A::LU{T}) = At_ldiv_Bt(A,B).'

function logdet2{T<:Real}(A::LU{T})  # return log(abs(det)) and sign(det)
    m, n = size(A); if m!=n error("matrix must be square") end
    dg = diag(A.factors)
    s = (bool(sum(A.ipiv .!= 1:n) % 2) ? -one(T) : one(T)) * prod(sign(dg))
    return sum(log(abs(dg))) , s 
end

function logdet{T<:Real}(A::LU{T})
    d,s = logdet2(A)
    if s<0 error("DomainError: determinant is negative") end
    return d
end

function logdet{T<:Complex}(A::LU{T})
    m, n = size(A); if m!=n error("matrix must be square") end
    s = sum(log(diag(A.factors))) + (bool(sum(A.ipiv .!= 1:n) % 2) ? complex(0,pi) : 0) 
    r,a = reim(s); a = a % 2pi; if a>pi a -=2pi elseif a<=-pi a+=2pi end
    return complex(r,a)    
end

logdet{T<:BlasFloat}(A::Matrix{T}) = logdet(factorize(A)) 

This makes logdet applicable to more general arguments, while maintaining type stability.

andreasnoack commented 11 years ago

@bsxfan Thank you for the explanation. I see your point now and have also managed to get complex numerical derivatives from my own likelihood function. I'll add your code to the pull request later today.

ViralBShah commented 9 years ago

See: https://groups.google.com/forum/#!topic/julia-users/FagaqWVkBGU

Currently, factorize(S::SparseMatrixCSC) does the following (bottom of cholmod.jl), which is too simplistic.

# placing factorize here for now. Maybe add a new file
function factorize(A::SparseMatrixCSC)
    m, n = size(A)
    if m == n
        Ac = cholfact(A)
        Ac.c.minor == m && ishermitian(A) && return Ac
    end
    return lufact(A)
end

Instead, we should be doing as Tim Davis suggested in the thread on the mailing list: http://faculty.cse.tamu.edu/davis/publications_files/Factorize_an_object_oriented_linear_system_solver_for_MATLAB.pdf

jiahao commented 9 years ago

A quick skim of the cited paper turns up Section 3.2 (digital page 15), which can be translated into code of this form:

function factorize{T}(A::AbstractSparseArray{T,2})
    n, m = size(A)
    if n != m #A is rectangular
        #Compute QR factorization of A
        #TODO return qrfact of A or A', whichever is tall and thin.
        return qrfact(A) #( n>m ? A, A') 
    end

    #At this point A is square
    if issym(A) #A is symmetric
        if all(diag(A).!= 0) && all(imag(diag(A)).== 0)
            #A has all-real nonzero diagonal
            try
                return cholfact(A)
            catch e
                isa(e, PosDefError) || raise(e)
            end
        end

        # cholfact failed or condition on the diagonal does 
        # not hold; do LDLt

        try
            return ldltfact(A)
        #catch some specific failure?
        end
    end

    #Above conditions do not hold or chol and ldl failed
    return lufact(A)
end

The specific implementations of qrfact, cholfact, ldltfact and lufact required are summarized in Table III, digital page 12:

screen shot 2015-01-05 at 3 50 26 pm

ViralBShah commented 9 years ago

Thanks @jiahao.

andreasnoack commented 9 years ago

It is a good idea to follow Tim Davis' for the factorize(Sparse) methods, but I'm not sure it is a good idea to use factorize for Sparse\VecOrMat. We are not using it for Dense\VecOrMat for two reasons

  1. The complete check is too expensive when the right hand side is only used once
  2. factorize is not type stable (for concrete types)

Actually, I'm not sure why factorize is used in Sparse\VecOrMat right now. Going through the git history, I can see that I introduced the inferior factorize(Sparse) method to make our ARPACK wrapper able to efficiently factorize pd matrices. It used to lufact all matrices. However, Sparse\VecOrMat must have had a kind of polyalgorithm before I added factorize(Sparse) and I didn't touch any \s at that commit.

jiahao commented 9 years ago

Here is our current status:

ViralBShah commented 9 years ago

qrfact still remains, but could be a separate issue.

andreasnoack commented 9 years ago

I'm on that. Actually, I've already wrapped SPQR, but I have to write tests.

ViralBShah commented 9 years ago

Wow, you are amazing!