clCreateKernel error -48 for gradient_minAD & gradient_minFire on Apple Silicon

stevendbrown commented 2 years ago

Building with CONFIG=FDEBUG on macOS 12.0.1 & Apple M1 Max with DEVICE=GPU or DEVICE=OCLGPU is successful but when executing a job, fails with Error: clCreateKernel() gradient_minFire -48. If I reorder the kernels on lines 73-79 of Makefile.OpenCL, Error: clCreateKernel() gradient_minAD -48 also fails. Other kernels seem fine.

To reproduce, clone current master branch, modify Makefile.OpenCL lines 144 & 145 to enable FDEBUG (for some reason it does not work when specified in the make command for me), and run make DEVICE=GPU NUMWI=64. Then run a test job:

./bin/autodock_gpu_64wi -L ./input/1stp/derived/1stp_ligand.pdbqt -M ./input/1stp/derived/1stp_protein.maps.fld

Next, try reordering the kernels in lines 73-79 of Makefile.OpenCL to see if gradient_minAD in K1 will succeed, or show other kernel setup debug information. It fails as soon as it gets to either gradient_minAD or gradient_minFire.

Expected behavior: the job executes.

Editing lines 78 & 79 of Makefile.OpenCL to assign gradient_minSD to K6 and K7, recompiling and rerunning the example docking job completes successfully and produces poses similar to that in the original 1STP structure.

Information to help narrow down the bug

master branch commit ecb261d (current as of 2021-12-04)
macOS 12.0.1
compiler: g++ (Apple clang version 13.0.0 (clang-1300.0.29.3))
building with DEVICE=GPU or DEVICE=OCLGPU produces the same result.
CUDA driver/version: n.a.
neither GPU_INCLUDE_PATH and GPU_LIBRARY_PATH are set
first attempt running on macOS on Apple Silicon; no prior working versions known.

stevendbrown commented 2 years ago

With the line 78/79 hack in place, the M1 Max GPU is heavily utilized, which seems promising.

atillack commented 2 years ago

@stevendbrown Thank you for reporting this bug. I'll look into it. In the interim, using -l fire should have the same effect as changing the kernel assignments.

P.S. The OpenCL kernels are generally not interchangeable in the Makefile - AD and Fire kernels just happen to have the same set of arguments which is why this worked. To change the algorithm the -l or --lsmet command line argument is needed.

stevendbrown commented 2 years ago

Compiling from the same (unaltered) commit still fails despite using the -l sd switch on the command line. It seems to be an issue with the FIRE and AD kernels failing to initialize during the program startup phase. I haven't tracked down where in the source code it's happening, but with DEBUG=FDEBUG, I see getKernelInfo called for every kernel in order (K1, K2,...K7) through gradient_minSD successfully but then failing with the -48 error before either gradient_minFire or gradient_minSD are shown.

atillack commented 2 years ago

@stevendbrown I can reproduce the bug on an M1 Mac and see the following output: UNSUPPORTED (log once): createKernel: newComputePipelineState failed That's an internal failure in Metal (OpenCL gets converted into Metal on M1's). Unfortunately, there's nothing I can do to really fix this as this is a bug buried rather deeply on Apple's side (and since OpenCL is officially marked as deprecated on M1's likely not one to be fixed anytime soon).

Also, changing the compile order that seems to have worked in your case did not work in my case. Here, I needed to switch off kernels 5, 6, and 7 (by commenting out the respective kernel creation in performdocking.cpp) to get the simple make DEVICE=GPU NUMWI=32 test case to run.

Looking at the state of OpenCL on M1 Macs online, It is very likely that there are many factors affecting when and why stuff works and it looks like it crashing or failing to run at random for anything but the simplest kernels is the norm currently.

atillack commented 2 years ago

@stevendbrown I did file a bug report with Apple of course and am hopeful that as the ARM systems evolve these things will eventually be fixed.

There is a likelihood the bug and how many kernels can be created is related to the amount of registers etc. available on the CPU, so on your M1 Max you may be able to compile more than I could on the M1. For AD-GPU to work as intended, kernels 1-4 (essentials and Solis-Wets) as well as 7 (Adadelta) are what I would consider the bare minimum. This could be achieved by commenting out the lines for creation and deletion of tdata.kernel5 (SD) and tdata.kernel6 (Fire) and by of course not using either -l sd or -l fire: https://github.com/ccsb-scripps/AutoDock-GPU/blob/ecb261d8012a20b66fb1fa9e52b84fb4ede5d107/host/src/performdocking.cpp.OpenCL#L270 https://github.com/ccsb-scripps/AutoDock-GPU/blob/ecb261d8012a20b66fb1fa9e52b84fb4ede5d107/host/src/performdocking.cpp.OpenCL#L271 https://github.com/ccsb-scripps/AutoDock-GPU/blob/ecb261d8012a20b66fb1fa9e52b84fb4ede5d107/host/src/performdocking.cpp.OpenCL#L344 https://github.com/ccsb-scripps/AutoDock-GPU/blob/ecb261d8012a20b66fb1fa9e52b84fb4ede5d107/host/src/performdocking.cpp.OpenCL#L345

This is also rather hacky but I hope it can at least make things work on your M1 Max until there is a better solution :-)

stevendbrown commented 2 years ago

I couldn't find the "UNSUPPORTED (log once)..." text in the ADGPU source code so I feared it was coming from the Apple framework...

I commented out the lines you referenced, recompiled, ran the test docking command, but sadly it failed the same way as before. I tried decreasing the NUMWI parameter as low as 1 but regardless I get the same failure.

Thanks for your help @atillack !

atillack commented 2 years ago

@stevendbrown I was able to work around the issue and have a PR (#169 ) up to fix the Apple M1 (and up) issues :-)

atillack commented 2 years ago

@stevendbrown On a related note, after some testing - at least for the default test case on an M1 - NUMWI=128 had the fastest runtime.

stevendbrown commented 2 years ago

@atillack I can confirm that your M1_fix branch executes the ADADELTA algorithm on my hardware using the test described in the original issue report! Woot!

stevendbrown commented 2 years ago

Closing this issue with #169 merged into the main trunk.

ccsb-scripps / AutoDock-GPU

clCreateKernel error -48 for gradient_minAD & gradient_minFire on Apple Silicon #168