eth-cscs / COSMA

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
BSD 3-Clause "New" or "Revised" License
196 stars 27 forks source link

unexpected performance when using COSMA with CPU #95

Closed rohany closed 3 years ago

rohany commented 3 years ago

I'm seeing the following weak scaling performance when using COSMA configured with OpenBLAS:

Num Nodes Avg Exec Time (ms)
1 2867
2 3026
4 3208
8 3090
16 5659
32 5730
64 5741
128 5893
256 5889

On each node, I'm using 20 threads for OpenMP. The initial problem size / command line is env OMP_NUM_THREADS=20 COSMA_OVERLAP_COMM_AND_COMP=ON jsrun -b none -c ALL_CPUS -g ALL_GPUS -r 1 -n 1 /g/g15/yadav2/cosma/build/miniapp/cosma_miniapp -r 10 -m 8192 -n 8192 -k 8192. I'm on commit c7bdab95bac9d1175e5e58cb95efcb07a51157b1.

I'm not sure what happened at 16 nodes that caused the performance dip -- is something like this expected?