JuliaDiff / DualNumbers.jl

Julia package for representing dual numbers and for performing dual algebra
Other
80 stars 30 forks source link

WIP: Cholesky factorization of dual matrices #11

Closed c42f closed 10 years ago

c42f commented 10 years ago

As discussed on the mailing list, here's some code implementing Cholesky factorization of dual matrices in a reasonably efficient manner.

The current implementation doesn't play nicely with the cholfact!() infrastructure, so some additional changes will be required. I can do some digging to figure that out, but suggestions are also very welcome. This is close to my first attempt at production quality julia code, so there could well be some strange things I've done in places.

papamarkou commented 10 years ago

Great, Chris!

mlubin commented 10 years ago

@c42f, thanks for putting together this PR. @scidom I think this needed some more work before merging. There are no tests and it doesn't yet follow the standard cholfact interface so it may not even be called by default.

papamarkou commented 10 years ago

Ok, I will let you work on the tests and improve it @mlubin, happy if you can commit to it.

c42f commented 10 years ago

Yes, it's incomplete, I should have been clearer that I didn't expect it to be merged quite yet (that was what the "WIP" was meant to indicate). Never mind though - I'll just create a new pull request for additional changes. I'm happy to do those, just wanted to open the discussion.

@mlubin The current chol() interface for duals was more of a proof of concept than anything. It correctly gets called for chol(A) where A is a dual matrix, but nothing more.

The cholfact!() and cholfact() docs say

Base.cholfact!(A, [LU,][pivot=false,][tol=-1.0]) -> Cholesky

"cholfact!" is the same as "cholfact()", but saves space by overwriting the input "A", instead of creating a copy.

Base.cholfact(A, [LU,][pivot=false,][tol=-1.0]) -> Cholesky

Compute the Cholesky factorization of a dense symmetric positive (semi)definite matrix "A" and return either a "Cholesky" if "pivot=false" or "CholeskyPivoted" if "pivot=true". "LU" may be ":L" for using the lower part or ":U" for the upper part. The default is to use ":U". The triangular matrix can be obtained from the factorization "F" with: "F[:L]" and "F[:U]". The following functions are available for "Cholesky" objects: "size", "\", "inv", "det". For "CholeskyPivoted" there is also defined a "rank". If "pivot=false" a "PosDefException" exception is thrown in case the matrix is not positive definite. The argument "tol" determines the tolerance for determining the rank. For negative values, the tolerance is the machine precision.

Seems like quite a bit more work to be done to support all the options. Lower or upper is easy I guess - just transpose after the fact. Pivoting might require a little more work, though I guess it's just a matter of passing the pivot options along to cholfact!(real(A)), and applying the permutation correctly before computing the decomposition of epsilon(A).

I'm not yet sure whether any of this can usefully be done in-place. If we could reinterpret Matrix{Dual{Float64}} into a Matrix{Float64} of twice the number of rows we might have a chance. Do either of you know whether julia gives any guarantees on memory layout which would let us do that safely?

papamarkou commented 10 years ago

I don't know the answer to this question @c42f. Thanks for willing to work on improving your Cholesky factorization code, I didn't realize this was not ready (I merged without having noticed the WIP). If you could amend your code, possibly with the help of @mlubin, this would be great as I can't give it my immediate attention due to other ongoing coding work.

mlubin commented 10 years ago

@c42f, I've reverted the commit so that you can make a clean PR. What I was thinking about cholfact is that we should re-use the Cholesky parametrized type. So cholfact(Matrix{Dual{Float64}}) returns a Cholesky{Dual{Float64}}. Returning anything else might break code that explicitly stores the factorization object. As you mentioned, the matrix layout is an issue because we want to store two Matrix{Float64} instead of one Matrix{Dual{Float64}} with the factor. @andreasnoackjensen, any suggestions here for what we should do to provide a custom cholesky factorization for the Dual type?

c42f commented 10 years ago

Ok, thanks Miles. I can't reopen this PR since github knows it's merged (unless one of you guys can?) but we can keep discussing it here until I've got some new code to talk about.

It's a good question as to whether we want the factorization to be a pair of Matrix{Float64} vs a single Matrix{Dual{Float64}}. I was assuming we'd want the latter as output (if you look at my test chol() implementation you'll see that's what I produce at the moment). You're probably right that it's better to keep them split apart though: as soon as you want to solve an equation using the factorization, you'll likely want to call the blas again for efficiency.

Looking at the code in base/linalg/factorization.jl, I see the Cholesky stuff is rather tied to LAPACK at the moment, and testing shows it doesn't even work with a BigFloat array yet. That probably means we're a bit ahead of ourselves trying to get it all working for duals.

andreasnoack commented 10 years ago

I wonder if it would work if the Dual immutable was loosened such that you could have a Dual{Matrix{Float64}} instead of Matrix{Dual{Float64}}. It would probably require some care in the definitions of the arithmetic operations on Duals, but the BLAS calls would come automatically I guess.

c42f commented 10 years ago

Yes, allowing Dual{Matrix{Float64}} seems mathematically sensible, and would allow operations such as multiplication of dual matrices to be a lot faster for any matrices of reasonable size. My main worry would be whether the extra code complexity was worth it.

mlubin commented 10 years ago

@andreasnoackjensen, that doesn't quite resolve the issue because Cholesky{T} stores a Matrix{T}, so Cholesky{Dual{Matrix{Float64}}} would store the factor as a Matrix{Dual{Matrix{Float64}}.

The code needed to implement this will also surely double the complexity of the DualNumbers package, which I'm not sure if it's worth it just to be able to call BLAS.

andreasnoack commented 10 years ago

@mlubin At first, I thought the same, but instead of Cholesky{Dual{Matrix{Float64}}} is should be a Dual{Cholesky{Matrix{Float64}}}. Then if we define

function \(z::Dual, w::Dual)
    r = real(z)\real(w)
    du = real(z)\(epsilon(w) - epsilon(z)*r)
    return Dual(r, du)
end

it almost works for a Dual{Cholesky} lhs. It requires a * method for Cholesky, but that is okay to have anyway.

However, it is still not obvious that we want this. I am a little worried about the number of temporaries in these operations.

c42f commented 10 years ago

Calling BLAS can be a big deal performance-wise, at least vs the implementation of matrix multiplication which Matrix{Dual{Float64}} currently ends up using. Consider this fairly naive expansion in terms of the blas:

function dmul(A,B)
       rA = real(A)
       rB = real(B)
       dual(rA*rB, rA*epsilon(B) + epsilon(A)*rB)
end

A = dual(rand(1000,1000), rand(1000,1000))
B = dual(rand(1000,1000), rand(1000,1000))

Now we have

julia> @time dmul(A,B)
elapsed time: 0.112314667 seconds (80001120 bytes allocated)

julia> @time A*B
elapsed time: 1.349246384 seconds (16000400 bytes allocated)

It's a lot of extra allocation, but I'd take that for a 10x performance boost!

The question of whether DualNumbers should take on the complexity of making Dual{Cholesky{Matrix{Float64}}} work is tricky. Mainly I don't know whether it's something people actually need, and whether it will generalize usefully into a common pattern for other Cholesky factorizations. Would it be better just to define a new factorization type internal to DualNumbers and return that instead?

andreasnoack commented 10 years ago

I think the best solution would be to relax Dual to allow for Dual{Matrix} and Dual{Factorization}. It wouldn't require that many changes to the code and it wouldn't change anything for Dual{Float64}. The / and \ methods are easily fixed along the line of my example above, some methods restricted to Number or Real could just have the restriction removed and some methods for Dual should be restricted to Dual{T<:Number}.

mlubin commented 10 years ago

It's too bad that BLAS/LAPACK don't accept strides for matrices, since then we could just reinterpret a Matrix{Dual} into two different matrices. The 10x performance boost is definitely compelling. My only doubt is how this works with Dual{Float64} as a drop-in replacement for Float64. This will break any code that tries to store a Cholesky{T}, and there's no way to write down the factorization type correctly for both Float64 and Dual{Float64}.

mlubin commented 10 years ago

To put it more clearly, the primary use of DualNumbers is by users who don't need to know what Dual is and shouldn't need to make any modifications to their code besides making it type generic.

andreasnoack commented 10 years ago

This will break any code that tries to store a Cholesky{T},

That is right, but it is somewhat similar to the requirement that you cannot restrict your function arguments to Float64 when you want to use DualNumbers.

there's no way to write down the factorization type correctly for both Float64 and Dual{Float64}.

I don't get this part. Can you elaborate?

mlubin commented 10 years ago

@andreasnoackjensen There's no loss of exact typing when you write your function to take a number type T. Currently it's correct to assume that chol(::Matrix{T}) returns a Cholesky{T}, and it's reasonable to want to explicitly use this type in the code for storage or type assertions. However, if chol(::Matrix{Dual{T}}) returns a Dual{Cholesky{T}}, then there's no way to write down the return type that's correct for both T = Float64 and T = Dual{Float64}, unless I'm mistaken.

andreasnoack commented 10 years ago

@mlubin I think that some of your message is missing.

mlubin commented 10 years ago

Should be edited now.

andreasnoack commented 10 years ago

The idea was to have cholfact(Dual{Matrix{T}}) (not cholfact(Matrix{Dual {T}})) return a Dual{Cholesky{T}}. Then you would have the option to choose between cholfact(Dual{Matrix{T}}) and cholfact(Matrix{Dual{T}}) where the former can take advantage of BLAS, but sometimes your functions don't allow you to use it.

I don't know if it is feasible to differentiate through a Dual{Matrix{T}}, but I think that all the matrix algebra can work for the type with the modifications mentioned earlier.

c42f commented 10 years ago

Interesting conversation guys, though it leaves me unsure about which way is best to proceed. I'm not entirely sure I want to take on the work of rearranging the whole package around allowing Dual{Matrix{T}} just to get Cholesky factorization in. (I don't personally have a use case for this remember, it was just an interesting problem on the mailing list. This makes me slightly worried I'll make the wrong tradeoffs!)

andreasnoack commented 10 years ago

I agree. I also like to think about problems like this. I tried to compare the two approaches and in my tests the triangular Lyapunov solver consumed the gain from BLAS so I am not sure that this approach is worth it. Den 21/06/2014 08.40 skrev "Chris Foster" notifications@github.com:

Interesting conversation guys, though it leaves me unsure about which way is best to proceed. I'm not entirely sure I want to take on the work of rearranging the whole package around allowing Dual{Matrix{T}} just to get Cholesky factorization in. (I don't personally have a use case for this remember, it was just an interesting problem on the mailing list. This makes me slightly worried I'll make the wrong tradeoffs!)

— Reply to this email directly or view it on GitHub https://github.com/JuliaDiff/DualNumbers.jl/pull/11#issuecomment-46746238 .

c42f commented 10 years ago

@andreasnoackjensen - by "triangular Lyapunov solver" do you mean that the function tri_ss_solve() consumes the gain from using the BLAS on the real part? When I test the optimized version of chol() from the original PR vs the choldn!() posted on the mailing list, I get a factor of 6x speedup for 1000x1000 matrices.

andreasnoack commented 10 years ago

I am at the sea intensively marking exams without a decent internet connection so I cannot check, but could it be that the generic cholesky in my pull request is faster than the one you have used? Den 21/06/2014 15.26 skrev "Chris Foster" notifications@github.com:

@andreasnoackjensen https://github.com/andreasnoackjensen - by "triangular Lyapunov solver" do you mean that the function tri_ss_solve() consumes the gain from using the BLAS on the real part? When I test the optimized version of chol() from the original PR vs the choldn!() posted on the mailing list, I get a factor of 6x speedup for 1000x1000 matrices.

— Reply to this email directly or view it on GitHub https://github.com/JuliaDiff/DualNumbers.jl/pull/11#issuecomment-46753654 .

c42f commented 10 years ago

Sure, that could certainly be the case - I took choldn!() from the mailing list without looking much at it. A generic solution for Cholesky factorization will certainly be great to have. I think I found the pull request you refer to... https://github.com/JuliaLang/julia/pull/7236

mlubin commented 10 years ago

I just did some timings on 1000x1000 matrices. cholfact from the generic cholesky PR is only about 20% faster than lufact, which is a bit surprising. @c42f's specialized cholesky factorization is almost 4x faster than the generic cholesky, and 2x for solves.

Unless the generic cholesky can be easily improved, it still seems valuable to have this specialized code. I'd vote for squeezing the specialized factorization result into a Cholesky{Dual{T}} instead of trying to do Dual{Matrix{T}} and Dual{Cholesky{T}}.

c42f commented 10 years ago

Ok cool, thanks for the extra testing @mlubin. Along the same lines, it's probably worth having a specialized version of matrix multiplication for BLAS types, since that's so much faster. It's really too bad that gemm doesn't support striding since the BLAS versions imply a lot of extra memory allocation.