ORNL / ReSolve

Library of GPU-resident linear solvers
Other
58 stars 2 forks source link

simple example using lusol #177

Closed superwhiskers closed 2 months ago

superwhiskers commented 4 months ago

this adds an example using the lusol linear solver, derived from the klu example. from what i can tell, the implementation functions as it should, but there should be some amount of cleanup applied to the source code (remove the duplication of the specializedMatvec code from the lusol integration tests branch, among other things)

superwhiskers commented 4 months ago

here are some results from some benchmarking i did using hyperfine with a single warmup run

Benchmark 1: examples/lusol_lusol.exe ../tests/functionality/data/matrix_ACTIVSg2000_AC_ ../tests/functionality/data/rhs_ACTIVSg2000_AC_ 3 00 00 01 01 02 02
  Time (mean ± σ):     15.865 s ±  0.036 s    [User: 15.835 s, System: 0.026 s]
  Range (min … max):   15.794 s … 15.917 s    10 runs

Benchmark 2: examples/klu_klu.exe ../tests/functionality/data/matrix_ACTIVSg2000_AC_ ../tests/functionality/data/rhs_ACTIVSg2000_AC_ 3 00 00 01 01 02 02
  Time (mean ± σ):     749.2 ms ±   2.6 ms    [User: 730.2 ms, System: 19.1 ms]
  Range (min … max):   746.4 ms … 752.6 ms    10 runs

Summary
  examples/klu_klu.exe ../tests/functionality/data/matrix_ACTIVSg2000_AC_ ../tests/functionality/data/rhs_ACTIVSg2000_AC_ 3 00 00 01 01 02 02 ran
   21.18 ± 0.09 times faster than examples/lusol_lusol.exe ../tests/functionality/data/matrix_ACTIVSg2000_AC_ ../tests/functionality/data/rhs_ACTIVSg2000_AC_ 3 00 00 01 01 02 02   

some profiling may be beneficial if we want to figure out why this is so much slower. maybe tweaking the strategy or something would be helpful here

superwhiskers commented 4 months ago

here are some updated benchmarking results with the newer code. it should be ready to merge now. the biggest change we could make (from what i could tell) is to adjust the approximate bounds used so we don't need to reallocate the scratch space lusol uses. it should be ready to merge now, though

Benchmark 1: examples/lusol_lusol.exe ../tests/functionality/data/matrix_ACTIVSg2000_AC_ ../tests/functionality/data/rhs_ACTIVSg2000_AC_ 3 00 00 01 01 02 02
  Time (mean ± σ):     12.092 s ±  0.108 s    [User: 12.072 s, System: 0.020 s]
  Range (min … max):   12.007 s … 12.337 s    10 runs

Benchmark 2: examples/klu_klu.exe ../tests/functionality/data/matrix_ACTIVSg2000_AC_ ../tests/functionality/data/rhs_ACTIVSg2000_AC_ 3 00 00 01 01 02 02
  Time (mean ± σ):     751.1 ms ±   2.8 ms    [User: 720.4 ms, System: 30.9 ms]
  Range (min … max):   747.2 ms … 755.5 ms    10 runs

Summary
  examples/klu_klu.exe ../tests/functionality/data/matrix_ACTIVSg2000_AC_ ../tests/functionality/data/rhs_ACTIVSg2000_AC_ 3 00 00 01 01 02 02 ran
   16.10 ± 0.16 times faster than examples/lusol_lusol.exe ../tests/functionality/data/matrix_ACTIVSg2000_AC_ ../tests/functionality/data/rhs_ACTIVSg2000_AC_ 3 00 00 01 01 02 02
superwhiskers commented 4 months ago

i've just altered the branch history so that it's based off #175 instead of #164, since we've added coo2coo there now