ccsb-scripps / AutoDock-GPU

AutoDock for GPUs and other accelerators
https://ccsb.scripps.edu/autodock
GNU General Public License v2.0
400 stars 110 forks source link

cudaGetDeviceCount failed initialization error #192

Open jrcodina opened 2 years ago

jrcodina commented 2 years ago

When I run:

[jr@login1 AutoDock-GPU-develop]$ make DEVICE=CUDA NUMWI=128

[jr@login1 AutoDock-GPU-develop]$ ./bin/autodock_gpu_128wi --ffile ./input/1stp/derived/1stp_protein.maps.fld --lfile ./input/1stp/derived/1stp_ligand.pdbqt --nrun 1

I get:

AutoDock-GPU version: v1.5-release

Running 1 docking calculation

cudaGetDeviceCount failed initialization error
autodock_gpu_128wi: ./host/src/performdocking.cpp:128: void setup_gpu_for_docking(GpuData&, GpuTempData&): Assertion `0' failed.
Aborted

My cluster info is:

[jr@login1 AutoDock-GPU-develop]$ lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                160
On-line CPU(s) list:   0-159
Thread(s) per core:    4
Core(s) per socket:    20
Socket(s):             2
NUMA node(s):          6
Model:                 2.2 (pvr 004e 1202)
Model name:            POWER9, altivec supported
CPU max MHz:           3800.0000
CPU min MHz:           2300.0000
L1d cache:             32K
L1i cache:             32K
L2 cache:              512K
L3 cache:              10240K
NUMA node0 CPU(s):     0-79
NUMA node8 CPU(s):     80-159
NUMA node252 CPU(s):
NUMA node253 CPU(s):
NUMA node254 CPU(s):
NUMA node255 CPU(s):

[jr@login1 AutoDock-GPU-develop]$ nvidia-smi
Thu Jun  2 10:48:48 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000004:04:00.0 Off |                    0 |
| N/A   38C    P0    35W / 300W |      0MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000035:03:00.0 Off |                    0 |
| N/A   37C    P0    35W / 300W |      0MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

[jr@login1 AutoDock-GPU-develop]$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Thu_Oct_24_17:58:26_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

Any ideas of what could be the issue?

atillack commented 2 years ago

@jrcodina96 This might be a potential driver issue as you're seeing this error: cudaGetDeviceCount failed initialization error

Maybe this will help you: https://www.ibm.com/docs/en/cloud-paks/cp-management/1.1.0?topic=upgrade-nvidia-gpu-driver-fails-initialize