Closed bcaddy closed 7 months ago
Yep, everywhere that the implicit sync seemed like it might be relevant were on functions that already contain an implicit sync (like moving or allocating). Since we only use 1 GPU stream there's an implicit sync between all kernels.
GPU Error Checking
I replaced the 3 macros and 7 functions for GPU error checking with a single overloaded function; one overload for CUDA/HIP checking and one for CUFFT/HIPFFT checking. The function supports wrapping a CUDA call or being called with no arguments to check the latest error.
I also added error checking to some
cudaMallocs
that were missing them or used them in a non-standard way.The other major change is the deprecation of the
CUDA_ERROR_CHECK
macro. Now error checking is on by default and can be disabled with the newDISABLE_GPU_ERROR_CHECKING
macro.This should resolve Issue #286 and possibly #296 as well, subject to discussion in that issue.