We used the wrong direction in one of the memcpys in the host-transfer alltoallv.
Curiously, this never showed up in testing, and indeed, testing continues to pass and manual verification indicates it is, in fact, producing the correct outputs (at least on Lassen). I hypothesize this is due to the unified memory eliminating the distinction between device and host pointers (from a correctness, not a performance, point-of-view). Unfortunately, I don't have a great way to test this hypothesis.
We used the wrong direction in one of the memcpys in the host-transfer alltoallv.
Curiously, this never showed up in testing, and indeed, testing continues to pass and manual verification indicates it is, in fact, producing the correct outputs (at least on Lassen). I hypothesize this is due to the unified memory eliminating the distinction between device and host pointers (from a correctness, not a performance, point-of-view). Unfortunately, I don't have a great way to test this hypothesis.