Problems with AutoDock-GPU compiled on a cluster

xavgit commented 10 months ago

Hi, I have compiled from the sources the latest version of AutoDock-GPU on the Leonardo cluster. I have compiled two versions. The first with DEVICE= CUDA and NUMWI=256 and the second with DEVICE=OCLGPU and NUMWI=256.

I have used the following commands: module load cuda module load python and module list returns: Currently Loaded Modulefiles: 1) profile/base 2) python/3.10.8--gcc--11.3.0 3) cuda/11.8

Key: default-version
Then export GPU_INCLUDE_PATH=/leonardo/prod/opt/compilers/cuda/11.8/none/include export GPU_LIBRARY_PATH=/leonardo/prod/opt/compilers/cuda/11.8/none/lib64 make DEVICE= ............................ If I run both the executables without any arguments there is no problem. I have made a test with only one ligand with the compiled versions and I receive errors. First I use prepare_gpf4.py and ADFR's autogrid4 and then the AutoDock executables.

For the CUDA version I run: $ python3 test_ad4gpu.py sh: line 1: 3637611 Aborted (core dumped) /leonardo/home/userexternal/slemme00/sources/AutoDock-GPU/bin_256wi/autodock_gpu_256wi -x 0 --ffile receptor.maps.fld --lfile DB16260.pdbqt --nrun 100 -N ./docking_res/DB16260.pdbqt_docking_res --gbest 1 > ./docking_res/DB16260.pdbqt_docking_res.log 2>&1 $ less docking_res/DB16260.pdbqt_docking_res.log autodock_gpu_256wi: ./host/src/performdocking.cpp:128: void setup_gpu_for_docking(GpuData&, GpuTempData&): Assertion `0' failed.

I also have used previous python script with slurm directives and 40 ligands but I receive the same problems.

For the OCLGPU version I run: $ python3 test_ad4gpu_ocl.py $ less docking_res_ocl/DB16260.pdbqt_docking_res.log AutoDock-GPU version: v1.5.3-54-g41083c5e1224d54ad043b62ca53f6618d5e8325d-dirty

Running 1 docking calculation

Kernel source used for development: ./device/calcenergy.cl
Kernel string used for building: ./host/inc/stringify.h
Kernel compilation flags: -I ./device -I ./common -DN256WI -cl-mad-enable Error: clGetPlatformIDs(): -1001

For the system I use at login I have: Atos Bull Sequana XH21355 "Da Vinci" Blade - Red Hat Enterprise Linux 8.6 (Ootpa)

3456 compute nodes with:

32 cores Ice Lake at 2.60 GHz
4 x NVIDIA Ampere A100 GPUs, 64GB
512 GB RAM

Internal Network: Nvidia Mellanox HDR DragonFly++ SLURM 22.05.7

test_ad4gpu.py
import os os.system( '/leonardo/home/userexternal/slemme00/sources/AutoDock-GPU/bin_256wi/autodock_gpu_256wi -x 0 --ffile receptor.maps.fld --lfile ' + 'DB16260.pdbqt' + ' --nrun 100 -N ' + './docking_res/' + 'DB16260.pdbqt' + '_docking_res --gbest 1 > ./docking_res/' + 'DB16260.pdbqt' + '_docking_res.log 2>&1' )

test_ad4gpu_ocl.py import os os.system( '/leonardo/home/userexternal/slemme00/sources/AutoDock-GPU/bin_oclgpu_256/autodock_gpu_256wi -x 0 --ffile receptor.maps.fld --lfile ' + 'DB16260.pdbqt' + ' --nrun 100 -N ' + './docking_res_ocl/' + 'DB16260.pdbqt' + '_docking_res --gbest 1 > ./docking_res_ocl/' + 'DB16260.pdbqt' + '_docking_res.log 2>&1' )

The previous python scripts were modified from a working code on a PC with only one RTX 2080Ti.

What I can do?

Thanks.

Saverio

atillack commented 10 months ago

@xavgit The Cuda runtime error should get resolved compiling with TARGETS="80" (plus other desired compute capabilities if there are other architectures). The OpenCL error you are seeing usually means the OpenCL platform isn't registered (installed) on the system.

xavgit commented 10 months ago

Hi, all is fine now with your help.

Thanks.

Saverio

ccsb-scripps / AutoDock-GPU

Problems with AutoDock-GPU compiled on a cluster #241