Acellera / htmd

HTMD: Programming Environment for Molecular Discovery
https://software.acellera.com/docs/latest/htmd/index.html
Other
261 stars 59 forks source link

CUDA not visible? #990

Closed phisanti closed 3 years ago

phisanti commented 3 years ago

Hi, I am trying to run the equilibration protocol. Here, I got two error messages, one of them critical. Here is the code I am running:

from htmd.protocols.equilibration_v3 import Equilibration
from htmd.ui import *

md = Equilibration()
md.runtime = 1000
md.timeunits = 'fs'
md.temperature = 300
md.useconstantratio = False  # only for membrane sims
# # Add a 10A flat bottom potential to prevent the ligand from diffusing from original position during equilibration
# width = np.array([10, 10, 10])
# flatbot = GroupRestraint('segname L and noh', width, [(5, '0ns')])
# md.restraints = [flatbot] + md.defaultEquilRestraints('20ns')
md.write('./myoutputcharmm', './equil')
local = LocalGPUQueue()
local.submit('./equil/')
local.wait()

The myoutputcharmm points towards the output of the charm build tutorial. Now, when I run the file, I got the first warning: UserWarning: As of HTMD 1.21 support for ACEMD v2 has stopped. Please use ACEMD3 instead as well as the corresponding equilibration and production protocols. To disable this warning run once:from htmd import _disableWarnings; _disableWarnings('1.21'); ` However, I conda says I have acmd 3.3.0. Then, the critical error comes after few mins:

2021-02-09 15:52:32,392 - jobqueues.util - INFO - Trying to determine all GPU devices
2021-02-09 15:52:32,448 - jobqueues.localqueue - INFO - Using GPU devices 0
2021-02-09 15:52:32,449 - jobqueues.util - INFO - Trying to determine all GPU devices
2021-02-09 15:52:32,502 - jobqueues.localqueue - INFO - Queueing /notebooks/equil
2021-02-09 15:52:32,503 - jobqueues.localqueue - INFO - Running /notebooks/equil on device 0
2021-02-09 15:52:32,528 - jobqueues.localqueue - ERROR - Error in simulation /notebooks/equil. Command '/equil/job.sh' returned non-zero exit status 1.

So, apparently, it does not find CUDA or the GPU. However, I am sure it is there. See:

$nvidia-smi
Tue Feb  9 15:59:24 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06    Driver Version: 450.36.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P5000        On   | 00000000:00:05.0 Off |                  Off |
| 26%   28C    P8     7W / 180W |      4MiB / 16278MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

and

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

I am doing something wrong?

stefdoerr commented 3 years ago

You can ignore or disable the warning. Can you post here the contents of the log file in the simulation directory?

phisanti commented 3 years ago

Here it is:

#
# ACEMD version 3.3.0
#
# Copyright (C) 2017-2019 Acellera (www.acellera.com)
#
# When publishing, please cite:
#   ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale
#   M. J. Harvey, G. Giupponi and G. De Fabritiis,
#   J Chem. Theory. Comput. 2009 5(6), pp1632-1639
#   DOI: 10.1021/ct9000685
#
# Arguments:
#   input: input
#   platform: 
#   device: 
#   ncpus: 
#   precision: mixed
#
# Looking for node-locked license in [/opt/acellera/license.dat] 
# Looking for node-locked license in [/opt/acellera/.acellera/license.dat] 
# Looking for node-locked license in [/opt/acellera/.htmd/license.dat] 
# Looking for node-locked license in [/root/license.dat] 
# Looking for node-locked license in [/root/.acellera/license.dat] 
# Looking for node-locked license in [/root/.htmd/license.dat] 
#
# ACEMD cannot run in a virtualized environment without a licence.
# Contact Acellera (info@acellera.com) for licencing.
#
stefdoerr commented 3 years ago

ACEMD cannot run in a virtualized environment without a licence.

Are you running in a container?

phisanti commented 3 years ago

I am running a EC2 machine with Ubuntu20 + CUDA 10 and. I build the system from docker. Is that an issue?

stefdoerr commented 3 years ago

Yes, ACEMD will not run inside docker without a license. So you either need to obtain a license or run in a normal machine.

phisanti commented 3 years ago

Okay, thanks for the help!

stefdoerr commented 3 years ago

You are welcome :)