eth-cscs / DLA-Future

DLA-Future
https://eth-cscs.github.io/DLA-Future/master/
BSD 3-Clause "New" or "Revised" License
64 stars 14 forks source link

Fix performance of local version of bt_band_to_tridiagonal #1144

Closed rasolca closed 5 months ago

rasolca commented 5 months ago

Panels were not indexed correctly leading to over constraining dependencies.

Closing #1136.

rasolca commented 5 months ago

cscs-ci run

rasolca commented 5 months ago

distributed:

[0]
[0] 0.329912s 6509.26GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[1]
[1] 0.314647s 6825.07GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[2]
[2] 0.317663s 6760.25GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[3]
[3] 0.313616s 6847.49GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[4]
[4] 0.316185s 6791.85GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU

[0]
[0] 2.51927s 6819.4GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[1]
[1] 2.52648s 6799.91GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[2]
[2] 2.52548s 6802.63GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[3]
[3] 2.51763s 6823.82GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[4]
[4] 2.5293s 6792.34GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU

local:

[0]
[0] 0.312804s 6865.26GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[1]
[1] 0.322055s 6668.06GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[2]
[2] 0.320392s 6702.68GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[3]
[3] 0.318858s 6734.93GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU
[4]
[4] 0.320701s 6696.23GFlop/s d (10240, 10240) (1024, 1024) 128 (1, 1) 64 GPU

[0]
[0] 2.4803s 6926.53GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[1]
[1] 2.5081s 6849.76GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[2]
[2] 2.50762s 6851.07GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[3]
[3] 2.51417s 6833.21GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[4]
[4] 2.51185s 6839.54GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU