Overlapping communication in different directions

In the current design, each MPI rank can have mutiple compute regions. This means that a rank may send multiple identical messages to another rank when transfering halos between regions.

Currently, this is handled by sending each transfer kind in diffferent waves, with a barrier between them. This means we have multiple MPI barriers, which may slow down communication.

The other option is to overlap all messages, but then we have to make the messages unique so they can be disambiguated at recv. To make these messages unique, we need to completely specify the communication type in the tag: have to encode dstGPU, direction, data field, and transfer kind.

For example, if we supported 8 GPUs and 64 data fields

dstGPU: 3 bits (0-7) dataIdx: 6 bits (0-63) direction: 1 bit (pos/neg) 3D transfer kinds: +x face, -x face +y face, -y face +z face, -z face +x/+z edge, +x/-z edge +x/+y edge, +x/-y edge -x/+z edge, -x/-z edge -x/+y edge, -x/-y edge +y/+z edge, +y/-z edge -y/+z edge, -y/-z edge +x/+y/+z, +x/+y/-z corner +x/-y/+z, +x/-y/-z corner -x/+y/+z, -x/+y/-z corner -x/-y/+z, -x/-y/-z corner

encode kind with a value for each dimension 0: not present 1: negative 2: positive 3: reserved require 2 bits per dimension

examples:              MSB  LSB
3D, +x face:         0b00_00_10
3D, -x/-y/+z corner: 0b10_01_01

22 bits left in the tag would support up to 11 dimensional stencils.

cwpearson / stencil

Overlapping communication in different directions #1