Cibiv / NextGenMap

NextGenMap is a flexible highly sensitive short read mapping tool that handles much higher mismatch rates than comparable algorithms while still outperforming them in terms of runtime. This allows analysing large scale datasets even with increased SNP rates or higher error rates (e.g. caused by specialized experimental protocols) and avoids biases caused by highly variable regions in the genome.
Other
86 stars 8 forks source link

OPENCL Couldn't create sub-devices. Error #34

Open maolun opened 6 years ago

maolun commented 6 years ago

Hello,

I was tying NGM at Texas Advanced Computing Center (https://portal.tacc.utexas.edu/user-guides/stampede2). However, an error occurs constantly. I compiled the NGM through CMake. I wonder if anyone has insight on how to solve this issue. Thanks. Any suggestion is greatly appreciated.

ESC[AESC[2K[OPENCL] Available platforms: 1 [OPENCL] AMD Accelerated Parallel Processing [OPENCL] Selecting OpenCl platform: AMD Accelerated Parallel Processing [OPENCL] Platform: OpenCL 1.2 AMD-APP (1214.3) [OPENCL] 1 CPU device found. [OPENCL] Device 0: Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz (Driver: 1214.3 (sse2,avx)) [OPENCL] Couldn't create sub-devices. Error: Error: Invalid value (-30) terminate called without an active exception

Best, Mao-Lun

maolun commented 6 years ago

Hello,

I made a change on line 117 of the file, OclHost.cpp, from "if (ciErrNum == -18)" to "if (ciErrNum == -30), and then compiled the sources using cmake. Now the NGM works.

Mao-Lun

fritzsedlazeck commented 6 years ago

Thanks Fritz

hermannschwaerzlerUIBK commented 5 years ago

Hi @maolun, hi @fritzsedlazeck,

I had the very same problem on this machine: https://www3.risc.jku.at/projects/mach2/. As far as I can tell the problem occurs as soon as a computer has more than 256 cores/hardware threads. MACH2 has more than 1700 cores and the Knights Landing (KNL) compute nodes of Stampede 2 have 272 hardware threads (if I read the documentation correctly).

My solution for the problem was this change to the code:

--- lib/mason/opencl/OclHost.cpp.orig   2019-05-14 16:33:09.313712490 +0200
+++ lib/mason/opencl/OclHost.cpp        2019-05-14 16:30:00.601698181 +0200
@@ -111,8 +111,8 @@
                        props[1] = 1; // 4 compute units per sub-device
                        props[2] = 0;

-                       devices = (cl_device_id *) malloc(256 * sizeof(cl_device_id));
-                       ciErrNum = clCreateSubDevices(device_id, props, 256, devices,
+                       devices = (cl_device_id *) malloc(2560 * sizeof(cl_device_id));
+                       ciErrNum = clCreateSubDevices(device_id, props, 2560, devices,
                                        &ciDeviceCount);
                        if (ciErrNum == -18) {
                                ciDeviceCount = 1;

This works for me but will fail as soon as there is a machine with more than 2560 cores (per node). A better solution might be to first find the core count (maybe like this: https://stackoverflow.com/questions/150355/programmatically-find-the-number-of-cores-on-a-machine) and use this number in the malloc and the clCreateSubDevices calls.

What do you think?

BTW: that comment "4 compute units per sub-device" you see above is most probably wrong, isn't it?

Greetings Hermann

fritzsedlazeck commented 5 years ago

Hi Hermann, thanks for digging in. I think that comment was left over from the GPU code.... Thanks for looking at this. Fritz