I'm seeing the following weak scaling performance when using COSMA configured with OpenBLAS:
Num Nodes
Avg Exec Time (ms)
1
2867
2
3026
4
3208
8
3090
16
5659
32
5730
64
5741
128
5893
256
5889
On each node, I'm using 20 threads for OpenMP. The initial problem size / command line is env OMP_NUM_THREADS=20 COSMA_OVERLAP_COMM_AND_COMP=ON jsrun -b none -c ALL_CPUS -g ALL_GPUS -r 1 -n 1 /g/g15/yadav2/cosma/build/miniapp/cosma_miniapp -r 10 -m 8192 -n 8192 -k 8192. I'm on commit c7bdab95bac9d1175e5e58cb95efcb07a51157b1.
I'm not sure what happened at 16 nodes that caused the performance dip -- is something like this expected?
I'm seeing the following weak scaling performance when using COSMA configured with OpenBLAS:
On each node, I'm using 20 threads for OpenMP. The initial problem size / command line is
env OMP_NUM_THREADS=20 COSMA_OVERLAP_COMM_AND_COMP=ON jsrun -b none -c ALL_CPUS -g ALL_GPUS -r 1 -n 1 /g/g15/yadav2/cosma/build/miniapp/cosma_miniapp -r 10 -m 8192 -n 8192 -k 8192
. I'm on commit c7bdab95bac9d1175e5e58cb95efcb07a51157b1.I'm not sure what happened at 16 nodes that caused the performance dip -- is something like this expected?