ORNL-CEES / DataTransferKit

A library for multiphysics solution transfer. ARCHIVED
https://datatransferkit.readthedocs.io/en/dtk-3.0/
BSD 3-Clause "New" or "Revised" License
47 stars 26 forks source link

Add Spline interpolation #581

Closed aprokop closed 3 years ago

aprokop commented 3 years ago

Fix #577.

@Rombur @dalg24 Looking for quick early feedback to see if there are any parts that have to be significantly changed.

aprokop commented 3 years ago

Does this work on GPU?

@Rombur Thanks for the review! It does. Tpetra uses CudaUVM, so it worked on GPU even without switching to Kokkos kernels. I'm still unsure if we need CUDA_LAUNCH_BLOCKING=1, seems to work without.

aprokop commented 3 years ago

Container has to be rebuilt and reuploaded, as we don't build it from the Dockerfile in .jenkins. I think currently Dockerfile has some leftovers from @masterleinad. @masterleinad Can you please clean it up?

aprokop commented 3 years ago

The only thing that remains to be done here (from my pov), is to add the comparison example between MLS and Spline. I don't think it's worth kokkosifying the setup phase at this point (though, we may need to do this, depending on the cost).

masterleinad commented 3 years ago

For me, it's not necessary to add the comparison here but I agree that we don't need to optimize the setup here for now.

aprokop commented 3 years ago

For me, it's not necessary to add the comparison here but I agree that we don't need to optimize the setup here for now.

That's fine with me.

aprokop commented 3 years ago

@Rombur @dalg24 Please review the latest version.

masterleinad commented 3 years ago

Some results on Ascent with 6 MPI ranks maximum error over the ranks

MLS 0 (2,1) setup: 22 ms, apply: 3 ms, error: 6.498728e-01
MLS 1 (2,1) setup: 20 ms, apply: 3 ms, error: 7.934971e-01
MLS 2 (2,1) setup: 138 ms, apply: 4 ms, error: 4.890796e-02
Spline 1 (2,1) setup: 160 ms, apply: 824 ms, error: 4.286823e-01
MLS 0 (4,2) setup: 20 ms, apply: 3 ms, error: 3.799103e-02
MLS 1 (4,2) setup: 20 ms, apply: 3 ms, error: 2.580187e-03
MLS 2 (4,2) setup: 23 ms, apply: 3 ms, error: 8.017628e-04
Spline 1 (4,2) setup: 46 ms, apply: 24 ms, error: 7.187830e-02
MLS 0 (8,4) setup: 20 ms, apply: 3 ms, error: 1.750967e-03
MLS 1 (8,4) setup: 20 ms, apply: 3 ms, error: 6.840655e-05
MLS 2 (8,4) setup: 25 ms, apply: 3 ms, error: 1.677768e-07
Spline 1 (8,4) setup: 58 ms, apply: 34 ms, error: 7.109034e-03
MLS 0 (16,8) setup: 27 ms, apply: 3 ms, error: 6.829721e-05
MLS 1 (16,8) setup: 38 ms, apply: 3 ms, error: 1.447533e-06
MLS 2 (16,8) setup: 49 ms, apply: 4 ms, error: 1.471431e-09
Spline 1 (16,8) setup: 112 ms, apply: 131 ms, error: 5.311297e-04
MLS 0 (32,16) setup: 69 ms, apply: 4 ms, error: 2.405030e-06
MLS 1 (32,16) setup: 73 ms, apply: 5 ms, error: 2.664813e-08
MLS 2 (32,16) setup: 79 ms, apply: 6 ms, error: 1.144884e-11
Spline 1 (32,16) setup: 167 ms, apply: 3655 ms, error: 3.785170e-05
MLS 0 (64,32) setup: 81 ms, apply: 7 ms, error: 7.997656e-08
MLS 1 (64,32) setup: 91 ms, apply: 9 ms, error: 4.535505e-10
MLS 2 (64,32) setup: 238 ms, apply: 14 ms, error: 8.570428e-14
Spline 1 (64,32) setup: 524 ms, apply: 55848 ms, error: 2.563109e-06
MLS 0 (128,64) setup: 158 ms, apply: 18 ms, error: 2.579835e-09
MLS 1 (128,64) setup: 224 ms, apply: 25 ms, error: 7.403294e-12
MLS 2 (128,64) setup: 1544 ms, apply: 41 ms, error: 7.098678e-16
Spline 1 (128,64) setup: 2456 ms, apply: 492049 ms, error: 1.670313e-07
MLS 0 (1,2) setup: 33 ms, apply: 3 ms, error: 9.178816e-02
MLS 1 (1,2) setup: 134 ms, apply: 4 ms, error: 3.641472e-02
MLS 2 (1,2) setup: 39 ms, apply: 5 ms, error: 4.281880e-02
Spline 1 (1,2) setup: 64 ms, apply: 352 ms, error: 5.383252e-02
MLS 0 (2,4) setup: 20 ms, apply: 3 ms, error: 8.895947e-03
MLS 1 (2,4) setup: 20 ms, apply: 3 ms, error: 1.863394e-03
MLS 2 (2,4) setup: 38 ms, apply: 4 ms, error: 1.308338e-03
Spline 1 (2,4) setup: 46 ms, apply: 23 ms, error: 8.227705e-03
MLS 0 (4,8) setup: 20 ms, apply: 3 ms, error: 9.060434e-04
MLS 1 (4,8) setup: 33 ms, apply: 3 ms, error: 9.689737e-05
MLS 2 (4,8) setup: 47 ms, apply: 6 ms, error: 6.669907e-05
Spline 1 (4,8) setup: 58 ms, apply: 23 ms, error: 1.072147e-03
MLS 0 (8,16) setup: 55 ms, apply: 5 ms, error: 6.950580e-05
MLS 1 (8,16) setup: 60 ms, apply: 5 ms, error: 4.058349e-06
MLS 2 (8,16) setup: 79 ms, apply: 8 ms, error: 2.768902e-06
Spline 1 (8,16) setup: 104 ms, apply: 30 ms, error: 1.485510e-04
MLS 0 (16,32) setup: 76 ms, apply: 6 ms, error: 4.852920e-06
MLS 1 (16,32) setup: 90 ms, apply: 8 ms, error: 1.486039e-07
MLS 2 (16,32) setup: 262 ms, apply: 15 ms, error: 1.001527e-07
Spline 1 (16,32) setup: 200 ms, apply: 132 ms, error: 1.952588e-05
MLS 0 (32,64) setup: 120 ms, apply: 13 ms, error: 3.213365e-07
MLS 1 (32,64) setup: 181 ms, apply: 19 ms, error: 5.044098e-09
MLS 2 (32,64) setup: 1764 ms, apply: 35 ms, error: 3.369217e-09
Spline 1 (32,64) setup: 436 ms, apply: 3655 ms, error: 2.501322e-06
MLS 0 (64,128) setup: 292 ms, apply: 31 ms, error: 2.068489e-08
MLS 1 (64,128) setup: 778 ms, apply: 72 ms, error: 1.644314e-10
MLS 2 (64,128) setup: 12308 ms, apply: 171 ms, error: 1.092495e-10
Spline 1 (64,128) setup: 2222 ms, apply: 56062 ms, error: 3.164574e-07

Cases with more points time out or abort for different reasons.