Reimplement in pure Julia?

dlfivefifty commented 7 years ago

It would be nice to be able to use this for BigFloat or with interval arithmetic. I believe this should be straightforward (more straightforward than translating LAPACK routines at least) though perhaps @thomasmach or @jaurentz have thoughts on this.

andreasnoack commented 7 years ago

Indeed. In noticed this work, https://github.com/eiscor/Eiscor.jl, recently.

jaurentz commented 7 years ago

This has been on my mind for almost a year now. Eiscor.jl is some preliminary testing. The short answer is: Yes I want to do this. The long answer includes something like: How much time will it take? Stay tuned.

dlfivefifty commented 7 years ago

Cool! Let me know if I can help. I think the O(N^2) complexity will mean this is useable for BigFloat, where as an O(N^3) dense QR Code would not

jverzani commented 7 years ago

I tried my hand at this for the real double shift case: https://gist.github.com/jverzani/035114b20c011dfc23ea31d95e788fba

I can't get the performance that I should get though. Anyone have pointers for improvement?

dlfivefifty commented 7 years ago

@jverzani Can you make a pull request? It'll be easier to clone the branch and test it out.

thomasmach commented 7 years ago

@jverzani: You should normalize the result of the fuse. Just call givensrot for the result u,v. Although, theoretically the result should be a rotation, the norm is often a little different from 1.

The special treatment of the misfit chasing then B^T and C (which you didn't implement yet) saves up to 15% of the runtime.

How does the code compare with the runtime of the Fortran version of AMVW?

jverzani commented 7 years ago

Thanks for the feedback.

I can't get the fortran version to compile so couldn't compare. I was only benchmarking against the roots function in Polynomials which uses julia's eigvals.

Is the special treatment the case when C and B are the same and passing U and V through them can be simplified, or is there something else that I missed. 15% would be a good improvement.

jverzani commented 7 years ago

I updated the gist with these two improvements, though they didn't get such enormous savings, they made a difference.

thomasmach commented 7 years ago

@jverzani Yes, the improvement is indeed just the simplification of the pass through of the misfit through B and C in case B_i^T=C_i. Your implementation looks correct to me. The Fortran code was designed with a focus on high degree polynomials. We tested our improvements typically with n=1024 or n=4096. The effect for large n is more visible, since the initial 5-8 iterations before the first deflation destroy most of the B_i^T=C_i structure in a n=15 example before the first deflation.

You should also have a look at issue #3, which explains that the crossover point for the double shift code is about n=120 and for the complex single shift code about n=40. Assuming that Lapack's code is cubic (which it isn't for small n) and that AMVW is quadratic the code might be 8 times slower for n=15. This seems to be in line with your observations.

thomasmach commented 7 years ago

@jverzani I made a comparison of your code with the Fortran version of AMVW for a random polynomial of degree 256: Fortran AMVW: 13 ms your julia amvw: 45 ms roots: 92 ms

Surprisingly your julia amvw is allocating a lot of memory:

BenchmarkTools.Trial: memory estimate: 4.08 MiB allocs estimate: 80566

minimum time: 39.801 ms (0.00% GC) median time: 45.303 ms (0.00% GC) mean time: 46.447 ms (0.91% GC) maximum time: 62.042 ms (0.00% GC)

samples: 108 evals/sample: 1 time tolerance: 5.00% memory tolerance: 1.00%

I think that could be the reason why your julia code is 3 times slower than the Fortran code. Can you please check if there are any unnecessary allocations.

jverzani commented 7 years ago

I'll look again to see where the allocations are happening. Hopefully I can track that down.

One thing I noticed when trying to do higher degree polynomials is that the Ci rotators can become trivial, and so recovering the A matrix is an issue.The bound you have has a product of the Cs > 1/||x||, but that can be less than eps() for even modest degree polynomials. Is there a recommendation here? The initial random shift makes a difference, so just restarting can sometimes lead to success, but perhaps there is something to do to catch this earlier.

On Wed, Mar 8, 2017 at 11:36 PM, Thomas Mach notifications@github.com wrote:

@jverzani https://github.com/jverzani I made a comparison of your code with the Fortran version of AMVW for a random polynomial of degree 256: Fortran AMVW: 13 ms your julia amvw: 45 ms roots: 92 ms

Surprisingly your julia amvw is allocating a lot of memory: BenchmarkTools.Trial: memory estimate: 4.08 MiB allocs estimate: 80566 minimum time: 39.801 ms (0.00% GC) median time: 45.303 ms (0.00% GC) mean time: 46.447 ms (0.91% GC) maximum time: 62.042 ms (0.00% GC)

samples: 108 evals/sample: 1 time tolerance: 5.00% memory tolerance: 1.00%

I think that could be the reason why your julia code is 3 times slower than the Fortran code. Can you please check if there are any unnecessary allocations.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/andreasnoack/AMVW.jl/issues/4#issuecomment-285252500, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZvTFxgAiViKFuIqlSKveGAS0GzmSZSks5rj4HegaJpZM4LkVAt .

-- John Verzani Chair, Department of Mathematics College of Staten Island, CUNY verzani@math.csi.cuny.edu

dlfivefifty commented 7 years ago

https://thirld.com/blog/2015/05/30/julia-profiling-cheat-sheet/

Checkout the section on --track-allocation=user. You then get a file that lists how much memory is allocated on each line

Sent from my iPhone

On 10 Mar 2017, at 00:01, john verzani notifications@github.com wrote:

I'll look again to see where the allocations are happening. Hopefully I can track that down.

One thing I noticed when trying to do higher degree polynomials is that the Ci rotators can become trivial, and so recovering the A matrix is an issue.The bound you have has a product of the Cs > 1/||x||, but that can be less than eps() for even modest degree polynomials. Is there a recommendation here? The initial random shift makes a difference, so just restarting can sometimes lead to success, but perhaps there is something to do to catch this earlier.

On Wed, Mar 8, 2017 at 11:36 PM, Thomas Mach notifications@github.com wrote:

@jverzani https://github.com/jverzani I made a comparison of your code with the Fortran version of AMVW for a random polynomial of degree 256: Fortran AMVW: 13 ms your julia amvw: 45 ms roots: 92 ms

Surprisingly your julia amvw is allocating a lot of memory: BenchmarkTools.Trial: memory estimate: 4.08 MiB allocs estimate: 80566 minimum time: 39.801 ms (0.00% GC) median time: 45.303 ms (0.00% GC) mean time: 46.447 ms (0.91% GC) maximum time: 62.042 ms (0.00% GC)

samples: 108 evals/sample: 1 time tolerance: 5.00% memory tolerance: 1.00%

I think that could be the reason why your julia code is 3 times slower than the Fortran code. Can you please check if there are any unnecessary allocations.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/andreasnoack/AMVW.jl/issues/4#issuecomment-285252500, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZvTFxgAiViKFuIqlSKveGAS0GzmSZSks5rj4HegaJpZM4LkVAt .

-- John Verzani Chair, Department of Mathematics College of Staten Island, CUNY verzani@math.csi.cuny.edu — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

thomasmach commented 7 years ago

The algorithm in AMVW is only (backward) stable for ||x|| small, as described in the paper. For polynomials with large coefficient norm one has to use the companion pencil instead of a companion matrix and a QZ instead of a QR algorithm. Details on the mathematics can be found in our recent preprint https://arxiv.org/abs/1611.02435. A Fortran implementation is available in a branch of eiscor (https://github.com/eiscor/eiscor/)

Our vision was to have an automatic decision which algorithm to use based on ||x||. Implementing the QZ is a relatively easy step if all the building blocks for the QR algorithm work satisfactory.

jverzani commented 7 years ago

I pushed this a bit, putting it into a package instead of gist (https://github.com/jverzani/AMVW.jl).

I'm doing something wrong though, as the code for complex coefficients is much slower to converge. I have a similar problem with both complex-real rotators and complex-complex rotators. (I implemented the latter while I worked out handling the diagonal terms, which I couldn't quite get to match the paper.) Maybe I have a mistake in create_bulge.

Otherwise, the allocations for the real coefficient case are basically on par with the roots function, though the roots function from PolynomialRoots seems to be as fast (for larger n) and more accurate.

Once the issue with complex coefficients is sorted out, I can take a crack at pushing this to the companion pencil.

jverzani commented 7 years ago

Okay, the code in https://github.com/jverzani/AMVW.jl seems to be working as expected now. Some benchmarks are here https://github.com/jverzani/AMVW.jl/blob/master/test/benchmark.jl.

dlfivefifty commented 7 years ago

This looks awesome. I just tried it with BigFloat, and it gave the roots up to 1E-76 accuracy (tested by evaluating the polynomial using Horner algorithm).

@dpsanders I wonder if you could combine this with ValidatedNumerics.jl to get a rigorous bounds on the roots? Right now it errors out

ERROR: MethodError: no method matching eps(::Type{ValidatedNumerics.Interval{Float64}})

andreasnoack commented 4 years ago

I'll close this one since there is nothing to do in this repo.

dpsanders commented 4 years ago

@dlfivefifty Sorry I don't think I ever saw your comment. That would definitely be very interesting. What's the situation with the pure Julia version?

andreasnoack commented 4 years ago

My impression was that https://github.com/jverzani/AMVW.jl was working but it looks like it hasn't been touched in a while.

jverzani commented 4 years ago

I renamed it, https://github.com/jverzani/AMRVW.jl . I'm sure it could be better done, but it seems to give proper results. I never got fortran version to compile to compare performance.

andreasnoack commented 4 years ago

I never got fortran version to compile to compare performance.

I've just set up the Fortran version with PackageCompiler and also officially released the package, see https://github.com/JuliaRegistries/General/pull/13212, so it should now be straight forward to use the Fortran version.

andreasnoack / FastPolynomialRoots.jl

Reimplement in pure Julia? #4

BenchmarkTools.Trial: memory estimate: 4.08 MiB allocs estimate: 80566

minimum time: 39.801 ms (0.00% GC) median time: 45.303 ms (0.00% GC) mean time: 46.447 ms (0.91% GC) maximum time: 62.042 ms (0.00% GC)