Failed to use cuda - Githubissues

M9JS commented 1 year ago

I tried running on GPU for faster structure optimization by setting use_device="cuda" in MolecularDynamics.

from chgnet.model.model import CHGNet
from chgnet.model.dynamics import MolecularDynamics
from pymatgen.core import Structure

structure = Structure.from_file("examples/o-LiMnO2_unit.cif")
chgnet = CHGNet.load()

md = MolecularDynamics(
    atoms=structure,
    model=chgnet,
    ensemble="nvt",
    compressibility_au=1.6,
    temperature=1000,  # in K
    timestep=2,  # in femto-seconds
    trajectory="md_out.traj",
    logfile="md_out.log",
    loginterval=100,
    use_device="cuda",  # use 'cuda' for faster MD
)
md.run(50)  # run a 0.1 ps MD simulation

But in the head of out put, it showed

CHGNet initialized with 400438 Parameters
CHGNet will run on cpu

Did I miss any step? Additionally, I wonder if cuda is avaliable in structure optimization.

janosh commented 1 year ago

I can't repro this. Perhaps the line

CHGNet will run on cpu

comes from some other call to CHGnet not shown in your pasted code snippet? can you post the complete script run logs?

janosh commented 1 year ago

Also, if you want to optimize a structure, don't use MolecularDynamics. There's a dedicated

https://github.com/CederGroupHub/chgnet/blob/73d3219fc77b581a57de86550bdb9e5ebdc8068f/chgnet/model/dynamics.py#L126

M9JS commented 1 year ago

Thanks for the reply. I know I should use StructOptimizer for structure optimization. I tried to set use_device="cuda" in both MD and Structure Optimization cases but neither of them worked. Here is the code for structure optimization.

from chgnet.model.model import CHGNet
from pymatgen.core import Structure
from chgnet.model import StructOptimizer
import sys

#polymorphs = ['alpha-V2O5', 'beta-V2O5', 'theta-V2O5', 'gamma-V2O5', 'R-V2O5', 'zeta-V2O5']
polymorphs = [ 'alpha-V2O5']

log_file = open("output.log", "w")
sys.stdout = log_file

for item in polymorphs:
    filepath = 'cif/' + item + '.cif'
    chgnet = CHGNet.load()
    structure = Structure.from_file(filepath)
    relaxer = StructOptimizer(use_device="cuda")
    result = relaxer.relax(structure)
    print(item + " relaxation done")
    print("CHGNet relaxed structure", result["final_structure"])

log_file.close()

And here is the run log of it

CHGNet initialized with 400438 Parameters
CHGNet initialized with 400438 Parameters
CHGNet will run on cpu
      Step     Time          Energy         fmax
*Force-consistent energies used in optimization.
FIRE:    0 10:18:12     -116.918056*       8.2636
FIRE:    1 10:18:15     -113.792671*      36.8282
FIRE:    2 10:18:21     -116.908831*      14.3075
FIRE:    3 10:18:23     -117.034975*       7.3715
FIRE:    4 10:18:28     -117.091505*       1.1211
FIRE:    5 10:18:33     -117.091772*       1.0367
FIRE:    6 10:18:39     -117.092239*       0.8753
FIRE:    7 10:18:45     -117.092787*       0.6517
FIRE:    8 10:18:51     -117.093321*       0.3881
FIRE:    9 10:18:56     -117.093748*       0.2655
FIRE:   10 10:19:03     -117.094069*       0.2813
FIRE:   11 10:19:09     -117.094362*       0.2935
FIRE:   12 10:19:15     -117.094776*       0.3812
FIRE:   13 10:19:20     -117.095337*       0.3574
FIRE:   14 10:19:26     -117.096098*       0.2610
FIRE:   15 10:19:31     -117.096939*       0.2241
FIRE:   16 10:19:37     -117.097794*       0.3516
FIRE:   17 10:19:43     -117.098808*       0.5521
FIRE:   18 10:19:49     -117.100224*       0.5439
FIRE:   19 10:19:55     -117.102039*       0.2935
FIRE:   20 10:20:01     -117.103949*       0.2201
FIRE:   21 10:20:07     -117.106125*       0.2358
FIRE:   22 10:20:13     -117.108875*       0.2014
FIRE:   23 10:20:18     -117.111946*       0.4896
FIRE:   24 10:20:20     -117.115938*       0.3181
FIRE:   25 10:20:26     -117.120398*       0.2259
FIRE:   26 10:20:32     -117.125912*       0.1827
FIRE:   27 10:20:37     -117.132080*       0.4828
FIRE:   28 10:20:43     -117.138222*       0.5662
FIRE:   29 10:20:49     -117.143188*       0.9741
FIRE:   30 10:20:54     -117.144016*       1.7581
FIRE:   31 10:21:00     -117.147114*       0.1575
FIRE:   32 10:21:06     -117.145205*       1.4733
FIRE:   33 10:21:12     -117.146086*       1.1555
FIRE:   34 10:21:18     -117.147114*       0.5863
FIRE:   35 10:21:24     -117.147501*       0.1122
FIRE:   36 10:21:29     -117.147514*       0.1053
FIRE:   37 10:21:35     -117.147528*       0.1025
FIRE:   38 10:21:41     -117.147541*       0.1019
FIRE:   39 10:21:47     -117.147568*       0.1013
FIRE:   40 10:21:53     -117.147594*       0.1005
FIRE:   41 10:21:59     -117.147621*       0.0997
alpha-V2O5 relaxation done
CHGNet relaxed structure Full Formula (V4 O10)
Reduced Formula: V2O5
abc   :   3.676327   4.108809  11.817436
angles:  89.999991  90.000001  89.999996
pbc   :       True       True       True
Sites (14)
  #  SP       a         b         c      magmom
---  ----  ----  --------  --------  ----------
  0  V5+   -0    0.901661  0.149606  0.00280577
  1  V5+    0.5  0.098339  0.350394  0.00280493
  2  V5+    0.5  0.098339  0.649606  0.00280564
  3  V5+   -0    0.901661  0.850395  0.00280446
  4  O2-   -0    0.992946  0         0.00467795
  5  O2-   -0    0.506065  0.14633   0.00305995
  6  O2-    0.5  0.992457  0.182077  0.00366482
  7  O2-   -0    0.007543  0.317923  0.00366476
  8  O2-    0.5  0.493935  0.35367   0.00306009
  9  O2-    0.5  0.007054  0.5       0.00467798
 10  O2-    0.5  0.493935  0.64633   0.00305997
 11  O2-    0    0.007543  0.682077  0.0036648
 12  O2-    0.5  0.992457  0.817923  0.00366491
 13  O2-   -0    0.506065  0.85367   0.00305988

Here is the cuda version for linux

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

BowenD-UCB commented 1 year ago

Hi, I tested your code, and the same code runs on CUDA with no issue on my platform with CUDA available. On platform where CUDA is not avaible, I get RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Can you please test the following codes and see the outputs?

import torch
from chgnet.model import CHGNet

# Load pretrained CHGNet
model = CHGNet.load()

# Check if CUDA is available
if torch.cuda.is_available():  
    num_devices = torch.cuda.device_count()
    print(f'Number of CUDA devices: {num_devices}')

    for i in range(num_devices):
        device = torch.device(f'cuda:{i}')
        print(f'Device name: {torch.cuda.get_device_name(i)}')

        # Check whether we can move CHGNet to this device
        model.to(f'cuda:{i}')
        print(f"CHGNet is on device {i}")

        # Now your model is on the CUDA device with ID 'i'
else: 
    print('CUDA is not available.')

try:
    model.to('cuda')
except:
    raise Exception('can not move to cuda')

M9JS commented 1 year ago

Here is the output, does it means that I didn't compiled torch with CUDA? Should I download a torch lib consisted with my cuda version?

CHGNet initialized with 400438 Parameters
CUDA is not available.
Traceback (most recent call last):
  File "/home/jinshi/CHGNet/cuda_test.py", line 25, in <module>
    model.to('cuda')
  File "/home/jinshi/anaconda3/envs/net/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/jinshi/anaconda3/envs/net/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/jinshi/anaconda3/envs/net/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/jinshi/anaconda3/envs/net/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/jinshi/anaconda3/envs/net/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/home/jinshi/anaconda3/envs/net/lib/python3.10/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jinshi/CHGNet/cuda_test.py", line 27, in <module>
    raise Exception('can not move to cuda')
Exception: can not move to cuda

BowenD-UCB commented 1 year ago

Yes, you need a torch that's compatible with your cuda version. Please refer to the pytorch installation resources like: https://pytorch.org/get-started/locally/

M9JS commented 1 year ago

Thanks!

CederGroupHub / chgnet

Failed to use cuda #38