Closed ancahamuraru closed 9 years ago
The difference is explained by the device properties.
For CUDA devices, check multiProcessorCount field from cudaDeviceProp structure and nbnxn_cuda_min_ci_balanced function.
For OpenCL devices, check computeUnits field from ocl_gpu_info_t structure and nbnxn_ocl_min_ci_balanced function.
The total number of threads and the thread configuration changes depending on the selected OpenCL device.
On the same machine, for the same input data, plist->nsci can have different values depending on the selected OpenCL device. Moreover the input data for the OpenCL kernel has a different arrangement, depending on the device.
As an example, for GTX660M, plist->nsci is 395 while for the I7-3610 QM Intel CPU it is 395.