Closed TobyFlynn closed 1 year ago
@reguly I've been trying to launch the gather kernels in a separate stream like the scatter kernels currently are but I'm running into some issues. I'll revisit it next week but in the meantime I'll merge this pull request.
My previous fix for syncing the gather kernels before the gpudirect MPI halo exchange call ends up waiting for the op2 kernel to finish executing over the core set. This pull request fixes this so that the halo exchange only waits on the gather kernels.