IPPL-framework / ippl

IPPL is a C++ library to develop performance portable code for fully Eulerian, Lagrangian or hybrid Eulerian-Lagrangian methods.
https://ippl-framework.github.io/ippl/
GNU General Public License v3.0
20 stars 20 forks source link

Bug in exchangeBoundaries (HaloCells.hpp) #302

Open s-mayani opened 4 months ago

s-mayani commented 4 months ago

Certain domain decomposition configurations of fields in IPPL cause hanging in the fillHalo() routine due to some problems in the exchangeBoundaries (found in src/Field/HaloCells.hpp) routine which is called to send data among neighbouring ranks.

The problem occurs when there ranks which have a domain which is of length 1 in one of the directions, since then for example edges and vertices are confounded in the 2D case, and the neighboring ranks compute the neighbor index incorrectly.

Attached are some slides detailing the issue: ippl_halocells_bug.pdf

Arc676 commented 2 months ago

Could you share the post-decomposition local domain boundaries for the problematic case? I can't run IPPL locally.

TL;DR if $|I_{2,y}| = 1$ and $D_2 \cap D_0$ is in the middle of $I_{0,y}$ (where the domains are products of intervals $Dr = \Pi{i = x,y}{I_{r,i}}$) then the neighbor identification algorithm will falsely identify rank 2 as a northeast vertex neighbor instead of an eastern edge neighbor. From what I remember about the ORB implementation, this shouldn't be possible, but it's my best guess with the available information.

Based on the information in the slides, the only explanation that comes to mind at the moment is that the domain intersection isn't aligned with the local domain. As shown in the figures on either of the last two slides, if rank 0 tries to receive tag 5001 from rank 2, it means that it thinks that the local domain on rank 2 only intersects its own domain in the northeast vertex (i.e. $113 \equiv 4{10}$). Based on the domain decomposition shown on slide 1, this is the wrong side of the domain (it's actually touching the south side). If the rank 2 domain is one-dimensional, it should actually be identified as a southeast vertex neighbor (index 1).

The neighbor identification assumes that the domain intersection is aligned with the local domain boundaries. There is no mechanism in place for a length-1 neighbor domain whose intersection lies in the middle of an edge (from what I remember, this shouldn't be possible).

aaadelmann commented 2 months ago

We will have a payed TA working on this issue. @Arc676 your input, such as above, would be very important! However this is an issue that we need to have resolved asap and given the fact that you only can sporadic work on it I feel it is better to hand this over to Jonas. Would you be avaidable for discussions?

Arc676 commented 2 months ago

Sounds good to me. I'm generally home in the evenings; we could chat about the issue in the usual channels.

aaadelmann commented 2 months ago

Awesome!