Closed stevendbrown closed 2 years ago
With the line 78/79 hack in place, the M1 Max GPU is heavily utilized, which seems promising.
@stevendbrown Thank you for reporting this bug. I'll look into it. In the interim, using -l fire
should have the same effect as changing the kernel assignments.
P.S. The OpenCL kernels are generally not interchangeable in the Makefile - AD and Fire kernels just happen to have the same set of arguments which is why this worked. To change the algorithm the -l
or --lsmet
command line argument is needed.
Compiling from the same (unaltered) commit still fails despite using the -l sd
switch on the command line. It seems to be an issue with the FIRE and AD kernels failing to initialize during the program startup phase. I haven't tracked down where in the source code it's happening, but with DEBUG=FDEBUG
, I see getKernelInfo
called for every kernel in order (K1, K2,...K7) through gradient_minSD
successfully but then failing with the -48 error before either gradient_minFire
or gradient_minSD
are shown.
@stevendbrown I can reproduce the bug on an M1 Mac and see the following output:
UNSUPPORTED (log once): createKernel: newComputePipelineState failed
That's an internal failure in Metal (OpenCL gets converted into Metal on M1's). Unfortunately, there's nothing I can do to really fix this as this is a bug buried rather deeply on Apple's side (and since OpenCL is officially marked as deprecated on M1's likely not one to be fixed anytime soon).
Also, changing the compile order that seems to have worked in your case did not work in my case. Here, I needed to switch off kernels 5, 6, and 7 (by commenting out the respective kernel creation in performdocking.cpp) to get the simple make DEVICE=GPU NUMWI=32 test
case to run.
Looking at the state of OpenCL on M1 Macs online, It is very likely that there are many factors affecting when and why stuff works and it looks like it crashing or failing to run at random for anything but the simplest kernels is the norm currently.
@stevendbrown I did file a bug report with Apple of course and am hopeful that as the ARM systems evolve these things will eventually be fixed.
There is a likelihood the bug and how many kernels can be created is related to the amount of registers etc. available on the CPU, so on your M1 Max you may be able to compile more than I could on the M1. For AD-GPU to work as intended, kernels 1-4 (essentials and Solis-Wets) as well as 7 (Adadelta) are what I would consider the bare minimum. This could be achieved by commenting out the lines for creation and deletion of tdata.kernel5 (SD) and tdata.kernel6 (Fire) and by of course not using either -l sd
or -l fire
:
https://github.com/ccsb-scripps/AutoDock-GPU/blob/ecb261d8012a20b66fb1fa9e52b84fb4ede5d107/host/src/performdocking.cpp.OpenCL#L270
https://github.com/ccsb-scripps/AutoDock-GPU/blob/ecb261d8012a20b66fb1fa9e52b84fb4ede5d107/host/src/performdocking.cpp.OpenCL#L271
https://github.com/ccsb-scripps/AutoDock-GPU/blob/ecb261d8012a20b66fb1fa9e52b84fb4ede5d107/host/src/performdocking.cpp.OpenCL#L344
https://github.com/ccsb-scripps/AutoDock-GPU/blob/ecb261d8012a20b66fb1fa9e52b84fb4ede5d107/host/src/performdocking.cpp.OpenCL#L345
This is also rather hacky but I hope it can at least make things work on your M1 Max until there is a better solution :-)
I couldn't find the "UNSUPPORTED (log once)..." text in the ADGPU source code so I feared it was coming from the Apple framework...
I commented out the lines you referenced, recompiled, ran the test docking command, but sadly it failed the same way as before. I tried decreasing the NUMWI
parameter as low as 1 but regardless I get the same failure.
Thanks for your help @atillack !
@stevendbrown I was able to work around the issue and have a PR (#169 ) up to fix the Apple M1 (and up) issues :-)
@stevendbrown On a related note, after some testing - at least for the default test case on an M1 - NUMWI=128
had the fastest runtime.
@atillack I can confirm that your M1_fix
branch executes the ADADELTA algorithm on my hardware using the test described in the original issue report! Woot!
Closing this issue with #169 merged into the main trunk.
Building with
CONFIG=FDEBUG
on macOS 12.0.1 & Apple M1 Max withDEVICE=GPU
orDEVICE=OCLGPU
is successful but when executing a job, fails withError: clCreateKernel() gradient_minFire -48
. If I reorder the kernels on lines 73-79 of Makefile.OpenCL,Error: clCreateKernel() gradient_minAD -48
also fails. Other kernels seem fine.To reproduce, clone current
master
branch, modify Makefile.OpenCL lines 144 & 145 to enable FDEBUG (for some reason it does not work when specified in themake
command for me), and runmake DEVICE=GPU NUMWI=64
. Then run a test job:Next, try reordering the kernels in lines 73-79 of
Makefile.OpenCL
to see ifgradient_minAD
in K1 will succeed, or show other kernel setup debug information. It fails as soon as it gets to eithergradient_minAD
orgradient_minFire
.Expected behavior: the job executes.
Editing lines 78 & 79 of
Makefile.OpenCL
to assigngradient_minSD
to K6 and K7, recompiling and rerunning the example docking job completes successfully and produces poses similar to that in the original 1STP structure.Information to help narrow down the bug
ecb261d
(current as of 2021-12-04)DEVICE=GPU
orDEVICE=OCLGPU
produces the same result.GPU_INCLUDE_PATH
andGPU_LIBRARY_PATH
are set