StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
657 stars 146 forks source link

Legion: Slowdown in modified Stencil #1658

Closed rupanshusoi closed 3 months ago

rupanshusoi commented 3 months ago

I'm running a modified version of Stencil on Perlmutter. I'm seeing a weird slowdown, here is a profile.

In this configuration, there are two wrapper tasks, and the first one executes twice. There are consequently two plateaus of GPU utilization due to the first wrapper task in the profile; those are fine.

The issue is they should've been followed by another plateau corresponding to the second wrapper task. But GPU utilization drops markedly in the second wrapper task: the gap between successive invocations of GPU kernels increases from 2 ms (in the first wrapper task), to about 40 ms (in the second). The utility processors and channel are not overloaded, so I don't know what is causing this slowdown.

Note that the three spikes in CPU utilization are just the copies; they are expected.

rupanshusoi commented 3 months ago

This problem did not reproduce on 2 nodes. Here is a profile. Note it has three equal plateaus of GPU utilization, as expected.

elliottslaughter commented 3 months ago

Rupanshu and I noticed that the 4-node profile has a very strange mapping during the third wrapper task that results in excessive copies. This appears to be what is slowing that part of the program down.

rupanshusoi commented 3 months ago

Enabling index launches fixed the mapping, which in turn fixed this issue.