How do I maximize my GPU use?

IHurley725 commented 3 weeks ago

Hello, I'm trying to use AutoDock-GPU to dock a protein with several thousand very short (4nt) nucleic acid strands, and I want to make sure I'm using all of my GPU's resources to get through them quickly. Can anyone suggest some parameters or options that would increase my efficiency?

Following issue #240 I tried providing batch inputs, but it looks like my jobs are running sequentially rather than in parallel. If one ligand takes ~10 seconds to converge then 5 ligands takes ~50 seconds.

Ligand: 88 atoms 23 rotatable bonds 7 atom types

Docking: Grid size 0.375 A Grid points (x,y,z) 81, 107, 75 nrun 20

System: Intel Core i9-14900 × 32 NVIDIA GeForce RTX 4090 Ubuntu 24.04 LTS CUDA Toolkit 12.2

I compiled using DEVICE=CUDA NUMWI=256 TARGETS=89

I'm running using one of the following commands: /home/AutoDock_GPU/AutoDock-GPU-develop/bin/89 --filelist batchfile.txt --heuristics 0 --nrun 20 --nev 50000000 /home/AutoDock_GPU/AutoDock-GPU-develop/bin/89 --lfile sequence1.pdbqt --ffile thrombin.maps.fld --heuristics 0 --nrun 20 --nev 50000000

The contents of my batch file are as follows:

thrombin.maps.fld
Sequence1.pdbqt
Ligand 1
Sequence2.pdbqt
Ligand 2
Sequence3.pdbqt
Ligand 3
Sequence4.pdbqt
Ligand 4
Sequence5.pdbqt
Ligand 5

Please let me know if there's anything I've missed! Thank you in advance for your help :)

atillack commented 1 week ago

@IHurley725 AD-GPU does not place multiple ligands concurrently on the GPU but multiple runs of the same ligand. What lowers efficiency (average occupancy) is the time in CPU-land to process results (file IO) and run the next ligand. If you compile with OVERLAP=ON ligand preparation and results processing happens concurrently to docking when using the batch file.

Additionally, if you have multiple GPUs (even different types) you could run with -D all (as a command line option) to automatically let AD-GPU use all of them.

One more note: NUMWI=256 could be a bit too large for small ligands, so I would recommend to test also with NUMWI=64 or NUMWI=128 (for us, usually 128 is the sweet spot). Also, we found that even on Nvidia cards, OpenCL (DEVICE=OCLGPU) sometimes can be a bit faster due to Cuda being pre-compiled while OpenCL is compiled at runtime.

IHurley725 commented 6 days ago

Hi @atillack thank you for the response! I wanted to test your suggestions, but I'm having an issue now with running AutoDock-GPU. I had to uninstall and reinstall my CUDA driver and toolkit last week, which may be related.

I compiled using DEVICE=CUDA NUMWI=128 TARGETS=89 (I also tried NUMWI=64 and 256)

When I run AD-GPU I get a Buffer Overflow error immediately. Here is the exact terminal output:

/home/user/AutoDock_GPU/AutoDock-GPU-develop/bin/autodock_gpu_128wi --filelist batchfile.txt --heuristics 0 --nrun 20 --nev 50000000
AutoDock-GPU version: v1.5-release

Running 5 docking calculations

*** buffer overflow detected ***: terminated
Aborted (core dumped)

The error happens immediately after running, it doesn't even print the CUDA device. Any thoughts? I'm going to try compiling with OpenCL tomorrow.

atillack commented 6 days ago

I would recommend to use the current develop version (we are working on a new version which will hopefully come out soon) as it does have a lot of bugfixes since 1.5.

ccsb-scripps / AutoDock-GPU

How do I maximize my GPU use? #266