Open dlfivefifty opened 9 years ago
I think it depends on scope of this package. For a roots function in a general polynomial package I completely agree with you, but this package was mainly meant as a wrapper for the code in the paper. I think the best thing would be to let the Polynomials package depend of this package and use the AMVW solver when the degree is very high.
I am confused with this observation. In my tests comparing the Fortran code of AMVW with Fortran LAPACK the LAPACK code is only faster for degree smaller than 20. Why do the tests in Fortran and in julia differ so much? And do your Lapack tests include the time for assembling the matrix?
Hmm, I reran the tests and am getting very different numbers, now the critical point seems to be about 75. I suspect my laptop was in low battery mode yesterday which can be unpredictable. The LAPack routine also has a weird jump in time at n~75 where it goes from being about 30% faster to 30% slower. I suspect this is an OpenBLAS quirk where it switches modes too early, but setting blas_set_num_threads(1) didn’t seem to have an effect.
gc_disable() p=[1.0:80]
@time for k=1:30 rootsjul(p) end @time for k=1:30 rootshess(p) end @time for k=1:30 AMVW.rootsAMVW(p) end
import Base.BLAS: blasint,BlasChar,BlasInt for (hseqr,elty) in ((:zhseqr,:Complex128),) @eval function hesseneigvals(M::Matrix{$elty}) A=vec(M)
N=size(M,1)
info =blas_int(0)
ilo = blas_int(1); ihi = blas_int(N); ldh=blas_int(N);ldz=blas_int(N);lwork = blas_int(N)
z=zero($elty)
work = Array($elty, N*N)
w=Array($elty,N)
Ec='E'
Nc='N'
ccall(($(BLAS.blasfunc(hseqr)),LAPACK.liblapack),
Void,
(Ptr{BlasChar},Ptr{BlasChar},
Ptr{BlasInt},Ptr{BlasInt},Ptr{BlasInt},Ptr{$elty}, #A
Ptr{BlasInt},Ptr{$elty},Ptr{$elty}, #z
Ptr{BlasInt},Ptr{$elty},Ptr{BlasInt},Ptr{BlasInt}),
&Ec,&Nc,&N , &ilo, &ihi, A, &ldh, w, &z, &ldz, work, &lwork, &info)
w
end
end
for (hseqr,elty) in ((:dhseqr_,:Float64),) @eval function hesseneigvals(M::Matrix{$elty}) A=vec(M)
N=size(M,1)
info =blas_int(0)
ilo = blas_int(1); ihi = blas_int(N); ldh=blas_int(N);ldz=blas_int(N);
lwork = blas_int(-1)
z=zero($elty)
work = Array($elty, 1)
wr=Array($elty,N)
wi=Array($elty,N)
Ec='E'
Nc='N'
for i=1:2
ccall(($(BLAS.blasfunc(hseqr)),LAPACK.liblapack),
Void,
(Ptr{BlasChar},Ptr{BlasChar},
Ptr{BlasInt},Ptr{BlasInt},Ptr{BlasInt},Ptr{$elty}, #A
Ptr{BlasInt},Ptr{$elty},Ptr{$elty},Ptr{$elty}, #z
Ptr{BlasInt},Ptr{$elty},Ptr{BlasInt},Ptr{BlasInt}),
&Ec,&Nc,&N , &ilo, &ihi, A, &ldh, wr,wi, &z, &ldz, work, &lwork, &info)
if lwork < 0
lwork=blas_int(real(work[1]))
work=Array($elty,lwork)
end
end
wr+im*wi
end
end
function companion_matrix{T}(c::Vector{T}) n=length(c)-1
if n==0
T[]
else
A=zeros(T,n,n)
@simd for k=1:n
@inbounds A[k,end]=-c[k]/c[end]
end
@simd for k=2:n
@inbounds A[k,k-1]=one(T)
end
A
end
end
rootshess(c)=hesseneigvals(companion_matrix(c)) rootsjul(c)=eigvals(companion_matrix(c))
On 9 Jan 2015, at 2:30 am, Thomas Mach notifications@github.com wrote:
I am confused with this observation. In my tests comparing the Fortran code of AMVW with Fortran LAPACK the LAPACK code is only faster for degree smaller than 20. Why do the tests in Fortran and in julia differ so much? And do your Lapack tests include the time for assembling the matrix?
— Reply to this email directly or view it on GitHub https://github.com/andreasnoack/AMVW.jl/issues/3#issuecomment-69195065.
I just tried this on my machine and I also get a crossover at 75. This was the same for OpenBLAS and MKL (which is not surprising because the QR algorithm is not that BLAS heavy).
The dhseqr routine is pretty wild and it appears to change algorithm at exactly n=75
. See the "further details" section. From the graph I conjecture that the cross over point could be moved quite a bit (log scale) to the right if the point for algorithm change is modified.
I've tried to change the default in LAPACK's dhseqr such that it uses the algorithm for small problems until the size is 500. This gives the following plot
and the crossover is somewhere around 120-130.
@thomasmach The timings include the construction of the companion matrix.
Encouragement: glad you guys are looking into this. While at it, maybe see if answers agree very as well as the numerics would expect?
For some reason Lapack is a relative factor of two than in my Fortran tests for AMVW. I don't why and will have a look into this next week.
The scaling coefficients of the last plots are
@andreasnoack Does you last comment mean that dhseqr is not in O(n^3)?
For the complex case the crossover point is much smaller, since AMVW's complex single shift version is only slightly slower, while zhseqr is almost 3 times slower than dhseqr.
@thomasmach No I don't think so. The asymptotic behavior just takes a little longer to kick in. There is a significant second order effect in the fit of the dgseqr
line and if I extend the degree of the polynomial, the slope approaches 3.
You are also completely right about the complex case. The plot for the complex case looks
and the cross over is 40. Why is your real solver relatively slower than LAPACK's?
Why is the complex code of AMVW almost as fast as the real code?
Additionally, the real double shift code needs slightly more iterations to converge.
In my tests LAPack's zhseqr/dhseqr is faster unless the polynomials are of higher degree, using my wrapper
https://github.com/ApproxFun/ApproxFun.jl/blob/master/src/LinearAlgebra/hesseneigs.jl
Maybe its worth wrapping these in AMVW for such cases? Unless I'm missing something.