PetrKryslUCSD / Sparspak.jl

Direct solution of large sparse systems of linear algebraic equations in pure Julia
MIT License
37 stars 7 forks source link

Make generic blas/lapack routines allocation-free #10

Closed j-fu closed 2 years ago

j-fu commented 2 years ago

Before:

generic003._test(N=[10,100,1000,10_000, 100_000])
n=10:
                                 Float64:  10.007 μs (113 allocations: 14.02 KiB)
      MultiFloats.MultiFloat{Float64, 1}:  19.844 μs (346 allocations: 49.84 KiB)
      MultiFloats.MultiFloat{Float64, 2}:  24.761 μs (378 allocations: 33.75 KiB)
   ForwardDiff.Dual{Float64, Float64, 1}:  24.175 μs (430 allocations: 38.52 KiB)

n=100:
                                 Float64:  85.403 μs (113 allocations: 82.17 KiB)
      MultiFloats.MultiFloat{Float64, 1}:  222.388 μs (3650 allocations: 336.25 KiB)
      MultiFloats.MultiFloat{Float64, 2}:  306.651 μs (4300 allocations: 527.41 KiB)
   ForwardDiff.Dual{Float64, Float64, 1}:  272.310 μs (4300 allocations: 669.56 KiB)

n=1000:
                                 Float64:  1.299 ms (132 allocations: 800.73 KiB)
      MultiFloats.MultiFloat{Float64, 1}:  2.956 ms (46817 allocations: 15.19 MiB)
      MultiFloats.MultiFloat{Float64, 2}:  3.462 ms (41737 allocations: 7.95 MiB)
   ForwardDiff.Dual{Float64, Float64, 1}:  3.126 ms (44044 allocations: 10.52 MiB)

n=10000:
                                 Float64:  38.397 ms (201 allocations: 9.45 MiB)
      MultiFloats.MultiFloat{Float64, 1}:  172.454 ms (969317 allocations: 1.10 GiB)
      MultiFloats.MultiFloat{Float64, 2}:  640.790 ms (974944 allocations: 1.12 GiB)
   ForwardDiff.Dual{Float64, Float64, 1}:  199.685 ms (966803 allocations: 1.10 GiB)

n=100000:
                                 Float64:  17.629 s (203 allocations: 279.91 MiB)
      MultiFloats.MultiFloat{Float64, 1}:  73.733 s (72530297 allocations: 137.63 GiB)
      MultiFloats.MultiFloat{Float64, 2}:  590.859 s (74948908 allocations: 139.45 GiB)
   ForwardDiff.Dual{Float64, Float64, 1}:  100.656 s (65286991 allocations: 120.63 GiB)

This PR:

generic003._test(N=[10,100,1000,10_000, 100_000])
n=10:
                                 Float64:  10.139 μs (113 allocations: 14.02 KiB)
      MultiFloats.MultiFloat{Float64, 1}:  8.184 μs (114 allocations: 15.03 KiB)
      MultiFloats.MultiFloat{Float64, 2}:  10.080 μs (114 allocations: 16.41 KiB)
   ForwardDiff.Dual{Float64, Float64, 1}:  9.360 μs (114 allocations: 17.14 KiB)
n=100:
                                 Float64:  82.238 μs (113 allocations: 82.17 KiB)
      MultiFloats.MultiFloat{Float64, 1}:  70.719 μs (114 allocations: 88.59 KiB)
      MultiFloats.MultiFloat{Float64, 2}:  88.428 μs (114 allocations: 102.77 KiB)
   ForwardDiff.Dual{Float64, Float64, 1}:  79.392 μs (114 allocations: 104.25 KiB)
n=1000:
                                 Float64:  1.318 ms (132 allocations: 800.73 KiB)
      MultiFloats.MultiFloat{Float64, 1}:  942.338 μs (134 allocations: 814.84 KiB)
      MultiFloats.MultiFloat{Float64, 2}:  1.145 ms (134 allocations: 952.12 KiB)
   ForwardDiff.Dual{Float64, Float64, 1}:  1.059 ms (137 allocations: 1012.48 KiB)
n=10000:
                                 Float64:  38.081 ms (201 allocations: 9.45 MiB)
      MultiFloats.MultiFloat{Float64, 1}:  30.215 ms (203 allocations: 9.82 MiB)
      MultiFloats.MultiFloat{Float64, 2}:  148.908 ms (203 allocations: 13.11 MiB)
   ForwardDiff.Dual{Float64, Float64, 1}:  54.052 ms (203 allocations: 12.93 MiB)
n=100000:
                                 Float64:  17.567 s (203 allocations: 279.91 MiB)
      MultiFloats.MultiFloat{Float64, 1}:  17.181 s (205 allocations: 290.91 MiB)
      MultiFloats.MultiFloat{Float64, 2}:  119.715 s (205 allocations: 496.59 MiB)
   ForwardDiff.Dual{Float64, Float64, 1}:  45.105 s (205 allocations: 487.82 MiB)

With this PR, the generic implementation is on par with the binary blas/lapack wrt. allocations. Before, their number was prohibitively large for any larger sized problem.

After all, I just rewrote the code from netlib. It is BSD licensed (see https://netlib.org/lapack), so there seems to be no problem including this here.

The results conform to the mantra regarding Julia performance: "In doubt, write your own loop" which worked quite well for me in several situations before.

What is puzzling is that the generic routines are faster than the standard ones (e.g. compare between Float64 and MultiFloat{Float64,1}). However I would not yet jump to conclusions here before seeing more tests with relevant problems.

I think this PR should be implemented before any announcement...

codecov-commenter commented 2 years ago

Codecov Report

Base: 91.01% // Head: 90.61% // Decreases project coverage by -0.40% :warning:

Coverage data is based on head (8ca68c5) compared to base (92835b5). Patch coverage: 89.72% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #10 +/- ## ========================================== - Coverage 91.01% 90.61% -0.41% ========================================== Files 13 13 Lines 1436 1641 +205 ========================================== + Hits 1307 1487 +180 - Misses 129 154 +25 ``` | [Impacted Files](https://codecov.io/gh/PetrKryslUCSD/Sparspak.jl/pull/10?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Petr+Krysl) | Coverage Δ | | |---|---|---| | [src/Utilities/GenericBlasLapackFragments.jl](https://codecov.io/gh/PetrKryslUCSD/Sparspak.jl/pull/10/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Petr+Krysl#diff-c3JjL1V0aWxpdGllcy9HZW5lcmljQmxhc0xhcGFja0ZyYWdtZW50cy5qbA==) | `89.22% <89.72%> (-4.26%)` | :arrow_down: | | [src/SparseSpdMethod/SpkMMD.jl](https://codecov.io/gh/PetrKryslUCSD/Sparspak.jl/pull/10/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Petr+Krysl#diff-c3JjL1NwYXJzZVNwZE1ldGhvZC9TcGtNTUQuamw=) | `96.57% <0.00%> (+0.38%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Petr+Krysl). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Petr+Krysl)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

PetrKryslUCSD commented 2 years ago

Nice work, Juergen! Many thanks.

j-fu commented 2 years ago

Dear Petr, could you tag a release ? I would like to announce this when the better performing version becomes available.

PetrKryslUCSD commented 2 years ago

Super, will do.

PetrKryslUCSD commented 2 years ago

Done.

On Tue, Sep 27, 2022, 10:40 AM Jürgen Fuhrmann @.***> wrote:

Dear Petr, could you tag a release ? I would like to announce this when the better performing version becomes available.

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/PetrKryslUCSD/Sparspak.jl/pull/10*issuecomment-1259840692__;Iw!!Mih3wA!D0JSEZKYAtAbqObrLt7EU3ziEQOBGMC-69i70HwAfz1xJtKkbWEnjDd6TpFWie122YoSsrcSTap0QLI-IVk2K56_$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ACLGGWCI4TRVDQEHODZYNILWAMWRFANCNFSM6AAAAAAQUTFMJY__;!!Mih3wA!D0JSEZKYAtAbqObrLt7EU3ziEQOBGMC-69i70HwAfz1xJtKkbWEnjDd6TpFWie122YoSsrcSTap0QLI-IXrwaEd0$ . You are receiving this because you modified the open/close state.Message ID: @.***>