Closed aprokop closed 3 years ago
Does this work on GPU?
@Rombur Thanks for the review! It does. Tpetra uses CudaUVM, so it worked on GPU even without switching to Kokkos kernels. I'm still unsure if we need CUDA_LAUNCH_BLOCKING=1
, seems to work without.
Container has to be rebuilt and reuploaded, as we don't build it from the Dockerfile in .jenkins
. I think currently Dockerfile
has some leftovers from @masterleinad. @masterleinad Can you please clean it up?
The only thing that remains to be done here (from my pov), is to add the comparison example between MLS and Spline. I don't think it's worth kokkosifying the setup phase at this point (though, we may need to do this, depending on the cost).
For me, it's not necessary to add the comparison here but I agree that we don't need to optimize the setup here for now.
For me, it's not necessary to add the comparison here but I agree that we don't need to optimize the setup here for now.
That's fine with me.
@Rombur @dalg24 Please review the latest version.
Some results on Ascent
with 6 MPI ranks maximum error over the ranks
MLS 0 (2,1) setup: 22 ms, apply: 3 ms, error: 6.498728e-01
MLS 1 (2,1) setup: 20 ms, apply: 3 ms, error: 7.934971e-01
MLS 2 (2,1) setup: 138 ms, apply: 4 ms, error: 4.890796e-02
Spline 1 (2,1) setup: 160 ms, apply: 824 ms, error: 4.286823e-01
MLS 0 (4,2) setup: 20 ms, apply: 3 ms, error: 3.799103e-02
MLS 1 (4,2) setup: 20 ms, apply: 3 ms, error: 2.580187e-03
MLS 2 (4,2) setup: 23 ms, apply: 3 ms, error: 8.017628e-04
Spline 1 (4,2) setup: 46 ms, apply: 24 ms, error: 7.187830e-02
MLS 0 (8,4) setup: 20 ms, apply: 3 ms, error: 1.750967e-03
MLS 1 (8,4) setup: 20 ms, apply: 3 ms, error: 6.840655e-05
MLS 2 (8,4) setup: 25 ms, apply: 3 ms, error: 1.677768e-07
Spline 1 (8,4) setup: 58 ms, apply: 34 ms, error: 7.109034e-03
MLS 0 (16,8) setup: 27 ms, apply: 3 ms, error: 6.829721e-05
MLS 1 (16,8) setup: 38 ms, apply: 3 ms, error: 1.447533e-06
MLS 2 (16,8) setup: 49 ms, apply: 4 ms, error: 1.471431e-09
Spline 1 (16,8) setup: 112 ms, apply: 131 ms, error: 5.311297e-04
MLS 0 (32,16) setup: 69 ms, apply: 4 ms, error: 2.405030e-06
MLS 1 (32,16) setup: 73 ms, apply: 5 ms, error: 2.664813e-08
MLS 2 (32,16) setup: 79 ms, apply: 6 ms, error: 1.144884e-11
Spline 1 (32,16) setup: 167 ms, apply: 3655 ms, error: 3.785170e-05
MLS 0 (64,32) setup: 81 ms, apply: 7 ms, error: 7.997656e-08
MLS 1 (64,32) setup: 91 ms, apply: 9 ms, error: 4.535505e-10
MLS 2 (64,32) setup: 238 ms, apply: 14 ms, error: 8.570428e-14
Spline 1 (64,32) setup: 524 ms, apply: 55848 ms, error: 2.563109e-06
MLS 0 (128,64) setup: 158 ms, apply: 18 ms, error: 2.579835e-09
MLS 1 (128,64) setup: 224 ms, apply: 25 ms, error: 7.403294e-12
MLS 2 (128,64) setup: 1544 ms, apply: 41 ms, error: 7.098678e-16
Spline 1 (128,64) setup: 2456 ms, apply: 492049 ms, error: 1.670313e-07
MLS 0 (1,2) setup: 33 ms, apply: 3 ms, error: 9.178816e-02
MLS 1 (1,2) setup: 134 ms, apply: 4 ms, error: 3.641472e-02
MLS 2 (1,2) setup: 39 ms, apply: 5 ms, error: 4.281880e-02
Spline 1 (1,2) setup: 64 ms, apply: 352 ms, error: 5.383252e-02
MLS 0 (2,4) setup: 20 ms, apply: 3 ms, error: 8.895947e-03
MLS 1 (2,4) setup: 20 ms, apply: 3 ms, error: 1.863394e-03
MLS 2 (2,4) setup: 38 ms, apply: 4 ms, error: 1.308338e-03
Spline 1 (2,4) setup: 46 ms, apply: 23 ms, error: 8.227705e-03
MLS 0 (4,8) setup: 20 ms, apply: 3 ms, error: 9.060434e-04
MLS 1 (4,8) setup: 33 ms, apply: 3 ms, error: 9.689737e-05
MLS 2 (4,8) setup: 47 ms, apply: 6 ms, error: 6.669907e-05
Spline 1 (4,8) setup: 58 ms, apply: 23 ms, error: 1.072147e-03
MLS 0 (8,16) setup: 55 ms, apply: 5 ms, error: 6.950580e-05
MLS 1 (8,16) setup: 60 ms, apply: 5 ms, error: 4.058349e-06
MLS 2 (8,16) setup: 79 ms, apply: 8 ms, error: 2.768902e-06
Spline 1 (8,16) setup: 104 ms, apply: 30 ms, error: 1.485510e-04
MLS 0 (16,32) setup: 76 ms, apply: 6 ms, error: 4.852920e-06
MLS 1 (16,32) setup: 90 ms, apply: 8 ms, error: 1.486039e-07
MLS 2 (16,32) setup: 262 ms, apply: 15 ms, error: 1.001527e-07
Spline 1 (16,32) setup: 200 ms, apply: 132 ms, error: 1.952588e-05
MLS 0 (32,64) setup: 120 ms, apply: 13 ms, error: 3.213365e-07
MLS 1 (32,64) setup: 181 ms, apply: 19 ms, error: 5.044098e-09
MLS 2 (32,64) setup: 1764 ms, apply: 35 ms, error: 3.369217e-09
Spline 1 (32,64) setup: 436 ms, apply: 3655 ms, error: 2.501322e-06
MLS 0 (64,128) setup: 292 ms, apply: 31 ms, error: 2.068489e-08
MLS 1 (64,128) setup: 778 ms, apply: 72 ms, error: 1.644314e-10
MLS 2 (64,128) setup: 12308 ms, apply: 171 ms, error: 1.092495e-10
Spline 1 (64,128) setup: 2222 ms, apply: 56062 ms, error: 3.164574e-07
Cases with more points time out or abort for different reasons.
Fix #577.
@Rombur @dalg24 Looking for quick early feedback to see if there are any parts that have to be significantly changed.