calab-ntu / gpu-cluster

Eureka and Spock GPU clusters
3 stars 0 forks source link

Optimum `GPU/CPU` overlapping and `GPU` performance heavily rely on using large `MPI` ra##nks #39

Open koarakawaii opened 2 years ago

koarakawaii commented 2 years ago

Issue

GAMER GPU test on new cluster (caler) suggests that performance (validated using Perf_Overall in Record__Performance) keeps increasing without saturation when MPI ranks increasing. The best performance is achieved by 16 MPI ranks, while using 32 MPI ranks will exceed GPU memory limitation. The time-step-averaged Perf_Overall summary is given as below (TIMING_SOLVER off)

1 MPI rank , GPU_NSTREAM = -1 (16): 4.3045e+07
2 MPI ranks, GPU_NSTREAM = -1 (16): 5.6315e+07
4 MPI ranks, GPU_NSTREAM = 10     : 6.4590e+07
8 MPI ranks, GPU_NSTREAM =  4     : 6.9180e+07
16 MPI ranks, GPU_NSTREAM = 1     : 7.3055e+07

Performance test