JuliaLinearAlgebra / ArnoldiMethod.jl

The Arnoldi Method with Krylov-Schur restart, natively in Julia.
https://julialinearalgebra.github.io/ArnoldiMethod.jl/dev
MIT License
102 stars 18 forks source link

Incorrect results #93

Closed nrontsis closed 8 months ago

nrontsis commented 5 years ago

I noticed that sometimes the results obtained via ArnoldiMethod.jl are incorrect. I attach a minimal example below:

Download myerror.jld2 and run:

using JLD2
using KrylovKit
using ArnoldiMethod
using Test

@load "myerror.jld2" W
println("Real part of rightmost eigevalue")
println(string("Direct: ", maximum(real(eigvals(W)))))

# ArnoldiMethod.jl sometimes reports incorrect results
@testset "ArnoldiMethod.jl" begin
    for i = 1:30
        decomp, history = partialschur(W, nev=3, which=LR(), tol=1e-13);
        λ, V = partialeigen(decomp);
        @test history.nconverged > 1
        @test abs(maximum(real(λ)) - maximum(real(eigvals(W)))) < 1e-6
    end
end

# An alternative implementation seems to work for this problem
# Occasionally KrylovKit reports LAPACKException(1), but this is discussed in https://github.com/Jutho/KrylovKit.jl/issues/10
@testset "KrylovKit.jl" begin
    for i = 1:30
        λ, V, info = eigsolve(W, 3, :LR, tol=1e-13)
        @test info.converged > 1
        @test abs(maximum(real(λ)) - maximum(real(eigvals(W)))) < 1e-6
    end
end

on

Julia version: 1.0.1 ArnoldiMethod.jl version: 0.0.4.

haampie commented 5 years ago

Thanks for reporting! At the moment I'm a bit busy, so I can't address this very soon.

I should add more tests for the case where eigenvalues as very close!

haampie commented 9 months ago

Can (no longer) reproduce

RalphAS commented 9 months ago

Try more iterations. It currently fails about 1/500 times with spurious eigenvalues and terrible residuals.

haampie commented 9 months ago

On what?

nrontsis commented 9 months ago

Hah, I was not expecting to hear back on this one 😄

Just to note that this eigenproblem is (obviously?) ill-conditioned, and originated by an iterative method that required the solution of thousands of eigenproblems. Similar issues were opened in other eigensolvers (though they were not returning silently incorrect results like this package used to do).

I think this eigenproblem could be useful as an ill-conditioned test case - e.g. if a collection of ill-conditioned problems is used to test this package.

In trying to reproduce, perhaps you could try running from several many random initial starting vectors, as discussed here.

RalphAS commented 9 months ago

Increasing the iteration count in the script above to a few thousand reliably gets a few failures on my system with the current master branch. Note that ArnoldiMethod normally starts with a new random draw each time.

For me, calling Random.seed!(2779) before partialschur gives a bad run; you might need a different value.

haampie commented 9 months ago

I initially could not download the test case, but noticed now the link points to my fork. Updated the URL so it works again.

Very quick check:

Spectrum looks like this:

Screenshot from 2024-02-15 10-31-48

About 26 eigenvalues with real part < -1, a large cluster (about 650) eigenvalues at 0, and then this single eigenvalue with real path approx 0.15.

With regard to ArnoldiMethod's convergence, quite hand-wavy: it's like the power method where the speed of convergence is determined by the ratio of the second and most dominant eigenvalue: abs(λ₂) / abs(λ₁), but slightly better cause the Krylov subspace is shift invariant, so you can pick any shift τ and look at abs(λ₂ - τ) / abs(λ₁ - τ), and on top of that because we're keeping track of a full subspace instead of a single vector, you can look at ratios between the center of clusters of eigenvalues, provided the cluster size <= maxdim. The cluster at 0 is larger than maxdim so slow to converge, the method would have a much easier time finding the eigenvalues with smallest real part.

Once the smallest real eigenvalues have converged, the largest real eigenvalue becomes relatively dominant. That's all to say: the target eigenvalue may start to converge only relatively late.

However, the stopping criterion of the Arnoldi method is a bit questionable in cases: it stops once nev eigenvalues closest to the target in its search subspace have converged. If however the actually closest eigenvalue to the target is not in the search subspace yet because it starts to converge relatively late, the method may return too early. The assumption is basically that the eigenvalues the user is interested in are sufficiently dominant and separated to immediately show up in the search subspace.

One way to work around this is to bump up nev, then I only get 1 failure in 100'000. (Presumably increasing maxdim could also help -- I think KrylovKit uses 30 vecs, ArnoldiMethod only 20):

@testset "ArnoldiMethod.jl" begin
    tgt = maximum(real(eigvals(W)))
    @show tgt
    for i = 1:100_000
        decomp, history = partialschur(W, nev=5, which=LR(), tol=1e-13);
        @test history.nconverged >= 1
        if abs(maximum(real(decomp.eigenvalues)) - tgt) > 1e-6
          @show decomp.eigenvalues
          @test false
        end
    end
end

So, I'm inclined to say this is not a bug, just a caveat of the method.

haampie commented 9 months ago

I will have a look into why sometimes the number of iterations needed for ArnoldiMethod.jl is rather excessive while the median number is OK.

And also why just bumping maxdim=30 seems to make the dense eigensolver fail to converge sometimes... that's certainly new.


Further partialschur(W, nev=1, tol=1e-13, which=:LR) never fails in 100K runs; so the cluster at 0 is really the issue, which is to be expected.

Further I think ArnoldiMethod is not hitting "exactly singular" issues thanks to e.g. #135.

haampie commented 9 months ago

Regarding partialschur(W, maxdim=30, nev=5, tol=1e-12), the problematic Hessenberg matrix of which the eigenvalues need to be computed looks like this:

[0.05791270343988066 1.5089167428008903 -0.6969823948072562 -0.3171043693823104 -0.1056790349638593 -0.0270028525164318 -0.03592926007475403 -0.09145477821544445 0.02749365368168674 -0.08055535081070868 -0.027466435871367982 -0.080033452422037 0.052276086051236244 0.0009315507214651274 0.026288735611728772 -0.01808489535531256 0.01499524433830044 0.004664674110328446 -0.011796333991592788 -0.04549617445772696 0.0059224301960711465 0.016692211510987338 -0.005062351467374356 -0.0037093796684796697 -0.007184930745284205 -0.01891448566953889 -0.004160333445196241 -0.0045378638039314 0.022458695944268338 -0.010727629218368978; 1.7933519409793208 -7.907371529004059 2.6878312056855527 0.3871551880601951 0.09127502544537036 -0.15761386430249014 -0.07870824682801375 0.2045258332944687 -0.34495936543234595 0.2829543971624766 -0.10515065185298803 0.2788652659401409 -0.2596769277825952 0.1006166487808873 -0.07017760040257308 0.018602532274564848 -0.08440842184620848 -0.028959037610192927 0.05703216432792135 0.2142778890536297 0.00984548709193813 0.00489971278129498 0.025851941503145712 0.01438975465924415 0.02802446891664933 0.07527897784180171 -0.002647074312117464 -0.0028897021336410918 -0.11578022694138082 0.0497763036978028; 0.0 2.974916638527389 -2.3857119095833346 2.7283049203702907 0.04810604024614914 -0.0677488469611963 0.08396616457398057 -0.1140233823468953 0.18344790531704325 -0.17207347660577088 0.16700601811727375 0.008817874353493446 0.05672042959015135 -0.05381071408153465 -0.07530149693314647 0.014497666106336113 0.0343861138402868 0.02527922547239361 -0.00879442767954778 -0.007289704547807615 0.02296972400521998 0.00447927196015066 -0.007486064656018023 0.00818338728164272 0.020300685555312302 0.04358247857994663 -0.004871294212621282 -0.00536094868622352 0.03563045647634484 0.005675899857340394; 0.0 0.0 2.9437918864274226 -10.038911313530177 1.7869500810822432 0.3066026486057063 -0.31043320131792546 0.16324910006353122 -0.1954330296626003 0.24605269523120898 -0.07765022757597634 -0.11650237603293917 0.1414251351561431 -0.06752321102391595 0.11129921135178443 -0.026810629271970025 0.004342081699149514 -0.038289070558279806 -0.03281555636464557 -0.2047886535253535 0.007401986969571927 0.0011324604052667928 -0.008708945829576771 -0.029394270919283344 -0.0867972709728772 -0.18763204571410314 -0.001511149626529857 -0.0016672462433964747 0.04018241457375961 -0.08489577474303717; 0.0 0.0 0.0 2.221301810079435 -8.998046155257857 3.22235516340822 0.28551268127852464 -0.3662104166894975 0.1590354127788442 0.10910756135891857 -0.2036777810381381 -0.28333742524585187 0.1515019288456211 -0.03453238665812315 0.35548611835041294 -0.08989717499120935 -0.04501488175675925 -0.03399255020430894 -0.03291874413669105 -0.1516123455401014 0.002632588056575653 0.0001876503905701778 0.004329644373919098 -0.05728997836769961 -0.05765467156186154 -0.1873601107590985 -0.0004977357178851231 -0.0005424844313214585 -0.04869767875924026 0.012558726304959631; 0.0 0.0 0.0 0.0 3.5979632314280208 -6.663239399247186 2.777389381538671 -0.0636037311009417 -0.0334839225129111 0.10440402466207857 -0.352416634676149 -0.22251559155842443 -0.1272203146148792 0.16054381737981055 0.38632154090041854 -0.11800311340098377 -0.008426632121162438 0.022016101094704203 0.0102144881210854 0.09542197234090401 0.0020641830886548498 -3.367003172277265e-5 0.0029996219192264337 0.008547242307109695 0.04664380309460358 0.08792632333673808 -0.00035649368928861776 -0.0003908830985869161 -0.019320062907593187 0.054042842554511576; 0.0 0.0 0.0 0.0 0.0 3.275953431763694 -7.848631122731064 3.3439790595894716 0.024649556090671917 -0.3422509736067912 -0.194093101807592 -0.07091069478981066 -0.3044322052760918 0.28436993981974334 -0.05136198819360503 -0.02462016483673448 0.0404226379881155 0.011680198723546337 0.039977457693763194 0.1176671167156832 0.0014436092064404533 -0.00014450972339383775 -0.006577640284641484 0.06664048857375723 0.02490817721312305 0.15220580479805895 -0.0002261143609133525 -0.00025844110059322113 0.07455282566032435 -0.07343406609078004; 0.0 0.0 0.0 0.0 0.0 0.0 3.724218238048316 -5.264657439517733 2.2089606869944496 -0.25463133032530294 -0.2339308542015861 -0.2273446712542719 0.05331868839511896 0.12820972402602293 -0.2578079596460878 0.02264865111020224 -0.014711758866401211 -0.01947285344417686 -0.019019731221480894 -0.10622998316863566 0.0013319449170718623 -0.00022266199839778532 0.002243117233341788 -0.029536403273968295 -0.04025506338705781 -0.11310866604028456 -0.0001924533488199872 -0.0002102408620800316 -0.022459417923219826 -0.007429220466391549; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.770145785150411 -6.708551168672895 2.8392908229040006 -0.357365162764837 -0.12733140027737772 0.3170646747334376 -0.12432438814526553 0.09276761669671267 -0.03509886602952713 -0.021956630106516335 0.021309829249439215 -0.040290744409005076 -0.023947944028854935 0.0007063079916141048 -0.0001630081620386378 0.0037689593559375907 -0.05634269174251361 0.026448961887018143 -0.055220684997319046 -9.383813784756903e-5 -9.616175705694772e-5 -0.06690422052252622 0.12336710712415062; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.2330804333415566 -7.08179000070135 1.539909872113376 0.2799539464837651 -0.19457375867263538 0.08472922108076425 0.06109008608610269 -0.05378590934812107 0.029970160626213743 -0.018749468961908336 0.056153072481522355 0.11139638940006852 0.0003552834282125848 -0.00011709429536668596 -0.006874555415212862 0.08249834098943343 -0.010790109540410884 0.12438623317813434 -4.005600215118213e-5 -5.584527898975513e-5 0.09707133655284722 -0.145568071978358; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.045770632125816 -4.007660248716421 1.3837102790441371 -0.022358709375845637 0.15233093330501518 -0.2795784502825249 -0.07761599679279783 -0.019975235748125084 0.020171791569975922 -0.005606816106820043 -0.05766646478484208 0.00012083251680050297 -0.0001420869682485326 0.0070966214355969795 -0.014184891141260113 0.01748051542529177 0.003284825804913802 4.838674053311495e-6 1.0507173679301572e-5 -0.04800382234990216 0.05529695548144365; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.7760074596958602 -5.788010767069059 2.805185356588569 0.01126887223483597 0.3832092299365627 -0.37513493828273387 -0.014599105233391373 -0.048332600811108586 -0.05754049861623606 0.04033225853019902 -0.00016488584739220245 -0.0001748903327620185 -0.01645357989877612 -0.04793373822633345 -0.06008085846432854 -0.17689973680673846 6.285101109537294e-5 6.229650944269442e-5 0.05501255538390161 -0.03636966783527142; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.957596719829789 -5.407125358415526 2.9448356527476816 0.2625869567899582 -0.5992936605314703 -0.06492856389671932 0.0463149773376485 0.0655194662823086 -0.023754758601775977 -0.0004239568099857801 -0.00029856656925449413 0.01694915767280278 0.06096291928282961 0.06587878018737615 0.20836334462445583 0.00012904152323281372 0.00014709375508973528 -0.04890623499179663 0.025854158997431403; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.051300201538155 -1.9278250517430482 -0.06196991974734687 -0.6478664901770008 -0.038301795584448906 -0.023725083953109457 -0.02490480683131525 0.004245035146342803 -0.0006345141174892673 -0.00038038037067375416 -0.006863463080716819 -0.02625896289231182 -0.027323233891041118 -0.08808861932693224 0.00018305815568247716 0.00019843953609946442 0.018776654443368325 -0.009029245787181011; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4743591467837972 -4.480788403515731 0.796412167226133 -0.0037890059256047715 -0.007458041795458059 0.011981008101435302 -0.00555649056403998 -0.0001408744908267365 -6.899149934982523e-6 0.0006570725131172846 0.003421907891268037 0.002861920374544506 0.01038721007159706 2.59847458291447e-5 2.8765438908863938e-5 -0.001172365567510069 -0.00014737146731408444; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8748898652235407 -0.5207377164132949 0.724075518811872 0.05038876562857095 -0.04276426911442398 0.02269799950898405 -0.000378275880468825 0.0001420186810495324 0.00023322896143625552 -0.004106046170421364 -0.00023124293363690942 -0.007424034628020593 3.987715007950179e-5 4.488463901068108e-5 -0.004130675300134573 0.006087920692100608; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7252515265329652 -10.354952886093773 -0.21411155402332033 0.6210905753317623 -0.2784512594660191 -2.6853387485475125e-5 1.2033757330055274e-5 -0.005206821782122694 0.046227660579228294 -0.006146379583910083 0.0696011486749071 2.8184709541008305e-6 -3.9353288639422585e-6 0.06067186299075732 -0.08403965665270216; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4876987323498727 -9.115949014755229 0.20516572921195814 -0.5094080881678525 -1.337743471795539e-6 -7.31860273130722e-6 0.02083347222572451 0.08009245968532182 0.08615074059416504 0.2771933683934974 7.811291017842132e-8 6.840784743665729e-6 -0.05763464609031092 0.02531219866527759; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.491887779746838 -9.230777963735944 -0.4962509875353187 -5.35492183052419e-8 -1.01391883189761e-6 0.0014037496531440208 0.09421242510641283 0.09674129144933759 0.3137294682549301 3.1578850578703054e-8 -4.647807024454335e-6 0.04017045300102949 -0.007560686622823428; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1008587503269951 -9.244350444283215 6.620002914597967e-8 -2.4643419020451367e-6 0.006208480064929932 0.04261741357551742 0.039114259579314956 0.13283879083401903 -1.2101753162508527e-8 7.326451379940441e-7 -0.00632177629888857 0.002238110029357466; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.801760108519137e-8 0.19967056072632966 -0.26591365306915704 -0.00014846014397978165 -5.443867297593276e-6 3.335929038455671e-6 -7.906777714355627e-8 0.036505692115822705 0.006924245037570334 2.57106096230959e-6 3.3967113515333616e-8; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.14992960911195477 -0.1996718473858975 0.003596336704954895 0.00021923886493620823 -0.00024042885441973727 5.494992915284897e-6 0.034006814963714285 0.0026671827057688784 -0.00012651654285523222 -2.8298825196845513e-6; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0036017138606995788 -10.271147548376504 -0.47249618517903547 0.6741982657112929 -0.015086549355991472 1.2305589652330325e-5 -4.0465517649062896e-5 0.3568249977061375 0.0094488115876474; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.14145342433873137 -9.30841505432732 0.30665207501863184 -0.017962775799553916 6.219068191763016e-8 1.46388375470453e-5 -0.1258859529805637 -0.10050884141374516; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.38282531044625157 -9.433426668199937 -0.0025972386177150935 9.803485199588041e-8 -1.0812390410638018e-5 0.09312243655841991 0.05588501564408238; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.01589708787421915 -8.98700944243664 4.2162653075307477e-7 1.1510257967048204e-7 -0.001083683964504522 0.02142734600939302; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.2259185608062783e-7 0.03215520934153351 -0.012345494246141975 -1.1143142487629819e-5 -2.784451086784847e-7; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.08375180235430589 -0.0321553443800285 0.0011642770828508924 3.0520645996608026e-5; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0011644364978552278 -10.072456032304526 -0.24144220035563782; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.021740744304687802 -9.925807632065508]

The QR-algorithm needs 27608 iterations for this 30x30 matrix... Pretty crazy? GenericLinearAlgebra.jl seems to do better. Will have a look.

haampie commented 9 months ago

Reduced the problem to a 4x4 matrix

H = [-9.000000046596169 9.363971416904122e-6 0.6216202324428521 0.783119615978767; -3.1249216068055166e-10 -9.000000125049475 -0.005030734831215954 0.026538692060151765; 0.0 2.5838932886290116e-12 -8.999999884550379 -4.118678562647915e-7; 0.0 0.0 5.499735555858365e-9 -8.99999994380397]

for which local_schurfact! needs 27489 iterations. That sounds like something that should be trivial to debug...

haampie commented 9 months ago

Ugh, after #136 I'm getting 12 out of 10000 failures with

partialschur(W, nev=3, which=:LR, maxdim=20, tol=1e-13)

and no failures with

partialschur(W, nev=5, which=:LR, maxdim=20, tol=1e-13)
partialschur(W, nev=5, which=:LR, maxdim=30, tol=1e-13)

The particular matrix hits pretty much all edge cases one can think of.

ArnoldiMethod.jl seems to behave much better than ARPACK though.

haampie commented 8 months ago

Inclined to close this now that #135 and #136 are merged.

(But will leave it open until I've checked what's up in those 12 out of 10'000 cases)

haampie commented 8 months ago

Actually #139 and #140 together reduce the failure rate to 0 out of 100K also for

partialschur(W, nev=3, which=:LR, maxdim=20, tol=1e-13)

previously it failed 12 out of 10K.

The spurious eigenvalues can be explained by instability of the Krylov-Schur restart in case of converged eigenpairs, so with the tiniest residuals. Those residuals are used to build a reflector, and the quality of that reflector is really poor then.

Solution is (a) to use Given's rotations instead of a Householder reflector on that vec that has entries of vastly different orders of magnitude, and (b) to purge converged, but non-desired eigenvalues.

For (b) I have to do more testing, cause maybe it slows down convergence because those vectors are no longer deflated and may re-appear due to rounding errors.