eth-cscs / DLA-Future

DLA-Future
https://eth-cscs.github.io/DLA-Future/master/
BSD 3-Clause "New" or "Revised" License
61 stars 13 forks source link

Investigate `--local` slowdown with `miniapp_bt_band_to_trid` on santis #1136

Closed rasolca closed 3 months ago

rasolca commented 3 months ago
srun ... miniapp_bt_band_to_tridiag --type d --m 20480 --n 20480 --mb 1024 --nb 1024 --b 128 --grid-rows 1 --grid-cols 1 --nruns 5 --dlaf:bt-band-to-tridiag-hh-apply-group-size=128 --pika:ini=pika.stacks.small_size=0x40000
[0] 2.52489s 6804.2GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[1] 2.51619s 6827.74GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[2] 2.95567s 5812.5GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[3] 2.54294s 6755.9GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[4] 2.55006s 6737.05GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
srun ... miniapp_bt_band_to_tridiag --type d --m 20480 --n 20480 --mb 1024 --nb 1024 --b 128 --grid-rows 1 --grid-cols 1 --nruns 5 --dlaf:bt-band-to-tridiag-hh-apply-group-size=128 --pika:ini=pika.stacks.small_size=0x40000 --local
[0] 7.36236s 2333.47GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[1] 7.40477s 2320.11GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[2] 7.06307s 2432.35GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[3] 7.63042s 2251.5GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
[4] 7.61308s 2256.62GFlop/s d (20480, 20480) (1024, 1024) 128 (1, 1) 64 GPU
rasolca commented 3 months ago

Closed by #1144