nbnxn_ocl_init decides on how command queue synchronization should be implemented: using memory polling or clFinish. For CUDA, the first option is sometimes much more efficient.
Check if this should also apply for OpenCL and if NVIDIA devices should be identified and handled similar to how they are in nbnxn_cuda_init.
The current nbnxn_ocl_init implementation always chooses clFinish and never memory polling.
Low priority issue.
nbnxn_ocl_init decides on how command queue synchronization should be implemented: using memory polling or clFinish. For CUDA, the first option is sometimes much more efficient.
Check if this should also apply for OpenCL and if NVIDIA devices should be identified and handled similar to how they are in nbnxn_cuda_init.
The current nbnxn_ocl_init implementation always chooses clFinish and never memory polling.