Reference-ScaLAPACK / scalapack

ScaLAPACK development repository
Other
128 stars 58 forks source link

Degradation of Performance when using 1xP grids #100

Open swathy3 opened 1 week ago

swathy3 commented 1 week ago

The cost of communication seems more prevalent when using Symmetric Eigen solvers (PXSYEVD) with 1xp grids. How should the grid be distributed when NPROCS is prime? How can I force the system to keep some nodes idle in this case? Which of the layers would need modifications? ScaLAPACK, PBLAS or BLACS? Any insights in this area would be helpful.

zerothi commented 1 week ago

You have to create a communicator with a size that can be grided. Then redistribute the matrices...

swathy3 commented 6 days ago

What if I am forced to create a 1xP grid for maximum utility of the system's hardware? The grid parameters are set by the application and I cannot tweak them. I'm trying to launch the application under two scenarios

  1. -np 189, with grid params 9x21(PxQ)
  2. -np 191 (max possible in the target system), with grid params 1x191(PxQ) I see considerable degradation in performance with the second case. However, I would expect improvement with increase in number of processes? Is there any layer of the ScaLAPACK I can exploit to resolve this case when I encounter a prime?
zerothi commented 6 days ago
  1. Well, you could choose block sizes that are so large that the last proc does not hold anything, but that also seems sub-optimal. I would suggest you reduce your comm-size. If 189 performs much better than 191, then that is obviously using your resources much better.