Closed c42f closed 10 years ago
Great, Chris!
@c42f, thanks for putting together this PR. @scidom I think this needed some more work before merging. There are no tests and it doesn't yet follow the standard cholfact interface so it may not even be called by default.
Ok, I will let you work on the tests and improve it @mlubin, happy if you can commit to it.
Yes, it's incomplete, I should have been clearer that I didn't expect it to be merged quite yet (that was what the "WIP" was meant to indicate). Never mind though - I'll just create a new pull request for additional changes. I'm happy to do those, just wanted to open the discussion.
@mlubin The current chol() interface for duals was more of a proof of concept than anything. It correctly gets called for chol(A) where A is a dual matrix, but nothing more.
The cholfact!() and cholfact() docs say
Base.cholfact!(A, [LU,][pivot=false,][tol=-1.0]) -> Cholesky
"cholfact!" is the same as "cholfact()", but saves space by overwriting the input "A", instead of creating a copy.
Base.cholfact(A, [LU,][pivot=false,][tol=-1.0]) -> Cholesky
Compute the Cholesky factorization of a dense symmetric positive (semi)definite matrix "A" and return either a "Cholesky" if "pivot=false" or "CholeskyPivoted" if "pivot=true". "LU" may be ":L" for using the lower part or ":U" for the upper part. The default is to use ":U". The triangular matrix can be obtained from the factorization "F" with: "F[:L]" and "F[:U]". The following functions are available for "Cholesky" objects: "size", "\", "inv", "det". For "CholeskyPivoted" there is also defined a "rank". If "pivot=false" a "PosDefException" exception is thrown in case the matrix is not positive definite. The argument "tol" determines the tolerance for determining the rank. For negative values, the tolerance is the machine precision.
Seems like quite a bit more work to be done to support all the options. Lower or upper is easy I guess - just transpose after the fact. Pivoting might require a little more work, though I guess it's just a matter of passing the pivot options along to cholfact!(real(A)), and applying the permutation correctly before computing the decomposition of epsilon(A).
I'm not yet sure whether any of this can usefully be done in-place. If we could reinterpret Matrix{Dual{Float64}} into a Matrix{Float64} of twice the number of rows we might have a chance. Do either of you know whether julia gives any guarantees on memory layout which would let us do that safely?
I don't know the answer to this question @c42f. Thanks for willing to work on improving your Cholesky factorization code, I didn't realize this was not ready (I merged without having noticed the WIP). If you could amend your code, possibly with the help of @mlubin, this would be great as I can't give it my immediate attention due to other ongoing coding work.
@c42f, I've reverted the commit so that you can make a clean PR. What I was thinking about cholfact
is that we should re-use the Cholesky
parametrized type. So cholfact(Matrix{Dual{Float64}})
returns a Cholesky{Dual{Float64}}
. Returning anything else might break code that explicitly stores the factorization object. As you mentioned, the matrix layout is an issue because we want to store two Matrix{Float64}
instead of one Matrix{Dual{Float64}}
with the factor.
@andreasnoackjensen, any suggestions here for what we should do to provide a custom cholesky factorization for the Dual
type?
Ok, thanks Miles. I can't reopen this PR since github knows it's merged (unless one of you guys can?) but we can keep discussing it here until I've got some new code to talk about.
It's a good question as to whether we want the factorization to be a pair of Matrix{Float64} vs a single Matrix{Dual{Float64}}. I was assuming we'd want the latter as output (if you look at my test chol() implementation you'll see that's what I produce at the moment). You're probably right that it's better to keep them split apart though: as soon as you want to solve an equation using the factorization, you'll likely want to call the blas again for efficiency.
Looking at the code in base/linalg/factorization.jl, I see the Cholesky stuff is rather tied to LAPACK at the moment, and testing shows it doesn't even work with a BigFloat array yet. That probably means we're a bit ahead of ourselves trying to get it all working for duals.
I wonder if it would work if the Dual
immutable was loosened such that you could have a Dual{Matrix{Float64}}
instead of Matrix{Dual{Float64}}
. It would probably require some care in the definitions of the arithmetic operations on Dual
s, but the BLAS calls would come automatically I guess.
Yes, allowing Dual{Matrix{Float64}}
seems mathematically sensible, and would allow operations such as multiplication of dual matrices to be a lot faster for any matrices of reasonable size. My main worry would be whether the extra code complexity was worth it.
@andreasnoackjensen, that doesn't quite resolve the issue because Cholesky{T}
stores a Matrix{T}
, so Cholesky{Dual{Matrix{Float64}}}
would store the factor as a Matrix{Dual{Matrix{Float64}}
.
The code needed to implement this will also surely double the complexity of the DualNumbers package, which I'm not sure if it's worth it just to be able to call BLAS.
@mlubin At first, I thought the same, but instead of Cholesky{Dual{Matrix{Float64}}}
is should be a Dual{Cholesky{Matrix{Float64}}}
. Then if we define
function \(z::Dual, w::Dual)
r = real(z)\real(w)
du = real(z)\(epsilon(w) - epsilon(z)*r)
return Dual(r, du)
end
it almost works for a Dual{Cholesky}
lhs. It requires a *
method for Cholesky
, but that is okay to have anyway.
However, it is still not obvious that we want this. I am a little worried about the number of temporaries in these operations.
Calling BLAS can be a big deal performance-wise, at least vs the implementation of matrix multiplication which Matrix{Dual{Float64}}
currently ends up using. Consider this fairly naive expansion in terms of the blas:
function dmul(A,B)
rA = real(A)
rB = real(B)
dual(rA*rB, rA*epsilon(B) + epsilon(A)*rB)
end
A = dual(rand(1000,1000), rand(1000,1000))
B = dual(rand(1000,1000), rand(1000,1000))
Now we have
julia> @time dmul(A,B)
elapsed time: 0.112314667 seconds (80001120 bytes allocated)
julia> @time A*B
elapsed time: 1.349246384 seconds (16000400 bytes allocated)
It's a lot of extra allocation, but I'd take that for a 10x performance boost!
The question of whether DualNumbers should take on the complexity of making Dual{Cholesky{Matrix{Float64}}}
work is tricky. Mainly I don't know whether it's something people actually need, and whether it will generalize usefully into a common pattern for other Cholesky factorizations. Would it be better just to define a new factorization type internal to DualNumbers and return that instead?
I think the best solution would be to relax Dual
to allow for Dual{Matrix}
and Dual{Factorization}
. It wouldn't require that many changes to the code and it wouldn't change anything for Dual{Float64}
. The /
and \
methods are easily fixed along the line of my example above, some methods restricted to Number
or Real
could just have the restriction removed and some methods for Dual
should be restricted to Dual{T<:Number}
.
It's too bad that BLAS/LAPACK don't accept strides for matrices, since then we could just reinterpret a Matrix{Dual}
into two different matrices. The 10x performance boost is definitely compelling. My only doubt is how this works with Dual{Float64}
as a drop-in replacement for Float64
. This will break any code that tries to store a Cholesky{T}
, and there's no way to write down the factorization type correctly for both Float64
and Dual{Float64}
.
To put it more clearly, the primary use of DualNumbers
is by users who don't need to know what Dual
is and shouldn't need to make any modifications to their code besides making it type generic.
This will break any code that tries to store a
Cholesky{T}
,
That is right, but it is somewhat similar to the requirement that you cannot restrict your function arguments to Float64
when you want to use DualNumbers
.
there's no way to write down the factorization type correctly for both Float64 and Dual{Float64}.
I don't get this part. Can you elaborate?
@andreasnoackjensen There's no loss of exact typing when you write your function to take a number type T
. Currently it's correct to assume that chol(::Matrix{T})
returns a Cholesky{T}
, and it's reasonable to want to explicitly use this type in the code for storage or type assertions. However, if chol(::Matrix{Dual{T}})
returns a Dual{Cholesky{T}}
, then there's no way to write down the return type that's correct for both T = Float64
and T = Dual{Float64}
, unless I'm mistaken.
@mlubin I think that some of your message is missing.
Should be edited now.
The idea was to have cholfact(Dual{Matrix{T}})
(not cholfact(Matrix{Dual {T}})
) return a Dual{Cholesky{T}}
. Then you would have the option to choose between cholfact(Dual{Matrix{T}})
and cholfact(Matrix{Dual{T}})
where the former can take advantage of BLAS, but sometimes your functions don't allow you to use it.
I don't know if it is feasible to differentiate through a Dual{Matrix{T}}
, but I think that all the matrix algebra can work for the type with the modifications mentioned earlier.
Interesting conversation guys, though it leaves me unsure about which way is best to proceed. I'm not entirely sure I want to take on the work of rearranging the whole package around allowing Dual{Matrix{T}}
just to get Cholesky factorization in. (I don't personally have a use case for this remember, it was just an interesting problem on the mailing list. This makes me slightly worried I'll make the wrong tradeoffs!)
I agree. I also like to think about problems like this. I tried to compare the two approaches and in my tests the triangular Lyapunov solver consumed the gain from BLAS so I am not sure that this approach is worth it. Den 21/06/2014 08.40 skrev "Chris Foster" notifications@github.com:
Interesting conversation guys, though it leaves me unsure about which way is best to proceed. I'm not entirely sure I want to take on the work of rearranging the whole package around allowing Dual{Matrix{T}} just to get Cholesky factorization in. (I don't personally have a use case for this remember, it was just an interesting problem on the mailing list. This makes me slightly worried I'll make the wrong tradeoffs!)
— Reply to this email directly or view it on GitHub https://github.com/JuliaDiff/DualNumbers.jl/pull/11#issuecomment-46746238 .
@andreasnoackjensen - by "triangular Lyapunov solver" do you mean that the function tri_ss_solve()
consumes the gain from using the BLAS on the real part? When I test the optimized version of chol()
from the original PR vs the choldn!()
posted on the mailing list, I get a factor of 6x speedup for 1000x1000 matrices.
I am at the sea intensively marking exams without a decent internet connection so I cannot check, but could it be that the generic cholesky in my pull request is faster than the one you have used? Den 21/06/2014 15.26 skrev "Chris Foster" notifications@github.com:
@andreasnoackjensen https://github.com/andreasnoackjensen - by "triangular Lyapunov solver" do you mean that the function tri_ss_solve() consumes the gain from using the BLAS on the real part? When I test the optimized version of chol() from the original PR vs the choldn!() posted on the mailing list, I get a factor of 6x speedup for 1000x1000 matrices.
— Reply to this email directly or view it on GitHub https://github.com/JuliaDiff/DualNumbers.jl/pull/11#issuecomment-46753654 .
Sure, that could certainly be the case - I took choldn!() from the mailing list without looking much at it. A generic solution for Cholesky factorization will certainly be great to have. I think I found the pull request you refer to... https://github.com/JuliaLang/julia/pull/7236
I just did some timings on 1000x1000 matrices. cholfact
from the generic cholesky PR is only about 20% faster than lufact
, which is a bit surprising. @c42f's specialized cholesky factorization is almost 4x faster than the generic cholesky, and 2x for solves.
Unless the generic cholesky can be easily improved, it still seems valuable to have this specialized code. I'd vote for squeezing the specialized factorization result into a Cholesky{Dual{T}}
instead of trying to do Dual{Matrix{T}}
and Dual{Cholesky{T}}
.
Ok cool, thanks for the extra testing @mlubin. Along the same lines, it's probably worth having a specialized version of matrix multiplication for BLAS types, since that's so much faster. It's really too bad that gemm doesn't support striding since the BLAS versions imply a lot of extra memory allocation.
As discussed on the mailing list, here's some code implementing Cholesky factorization of dual matrices in a reasonably efficient manner.
The current implementation doesn't play nicely with the cholfact!() infrastructure, so some additional changes will be required. I can do some digging to figure that out, but suggestions are also very welcome. This is close to my first attempt at production quality julia code, so there could well be some strange things I've done in places.