DeepGraphLearning / torchdrug

A powerful and flexible machine learning platform for drug discovery
https://torchdrug.ai/
Apache License 2.0
1.43k stars 199 forks source link

AssertionError: Torch not compiled with CUDA enabled #147

Open yaba-hun opened 1 year ago

yaba-hun commented 1 year ago

Hi, I'm trying to run DeepGraphLearning/torchdrug/test/test.py on the Ubuntu virtual machine of Win10. Because the code runs on a virtual machine, the pytorch version is the CPU version. I keep getting the following errors. Here are some of the errors.

python test.py
Traceback (most recent call last): File "/home/zyf/Code/torchdrug/test/utils/test_torch.py", line 12, in test_transfer result = utils.cuda(data) File "/home/zyf/anaconda3/envs/graphaf/lib/python3.7/site-packages/torchdrug-0.1.3-py3.7.egg/torchdrug/utils/torch.py", line 89, in cuda return type(obj)({k: cuda(v, *args, kwargs) for k, v in obj.items()}) File "/home/zyf/anaconda3/envs/graphaf/lib/python3.7/site-packages/torchdrug-0.1.3-py3.7.egg/torchdrug/utils/torch.py", line 89, in return type(obj)({k: cuda(v, *args, *kwargs) for k, v in obj.items()}) File "/home/zyf/anaconda3/envs/graphaf/lib/python3.7/site-packages/torchdrug-0.1.3-py3.7.egg/torchdrug/utils/torch.py", line 87, in cuda return obj.cuda(args, kwargs) File "/home/zyf/anaconda3/envs/graphaf/lib/python3.7/site-packages/torch/cuda/init.py", line 211, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled
The full error is here: error.txt

My code running environment: VMware + Linux + conda + python37 + pythorch-cpu-1.12.1 I tried to change the file "/home/zyf/Code/torchdrug/test/utils/test_torch.py" and other files that reported the same error, replacing "cuda" with "cpu". For example:

graph1 = graph.cuda() self.assertEqual(graph1.adjacency.device.type, "cuda", "Incorrect device") graph2 = graph1.cuda() self.assertEqual(graph2.adjacency.device.type, "cuda", "Incorrect device") self.assert_equal(graph, graph2, "device")

to

graph1 = graph.cpu() self.assertEqual(graph1.adjacency.device.type, "cpu", "Incorrect device") graph2 = graph1.cpu() self.assertEqual(graph2.adjacency.device.type, "cpu", "Incorrect device") self.assert_equal(graph, graph2, "device")

The following files are modified:

torchdrug/test/layers/test_conv.py torchdrug/test/layers/test_pool.py torchdrug/test/layers/test_readout.py torchdrug/test/layers/test_sampler.py torchdrug/test/data/test_graph.py torchdrug/test/utils/test_comm.py torchdrug/test/layers/test_spmm.py torchdrug/test/utils/test_torch.py

But after I modified it, I ran "python test.py" again and found that the code didn't seem to work. After running this code, the terminal keeps displaying the following:

.....F../home/zyf/anaconda3/envs/graphaf/lib/python3.7/site-packages/torchdrug-0.1.3-py3.7.egg/torchdrug/data/graph.py:542: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). scale = scale[-1] // scale ./home/zyf/anaconda3/envs/graphaf/lib/python3.7/site-packages/torchdrug-0.1.3-py3.7.egg/torchdrug/data/graph.py:728: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). edge_in_index = local_index // local_inner_size + edge_in_offset ..../home/zyf/Code/torchdrug/test/data/test_graph.py:337: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). node_in = (node_out - 1) // 2 /home/zyf/Code/torchdrug/test/data/test_graph.py:341: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). tree.dad = (torch.arange(self.num_node) - 1) // 2 .........

Could you please help me to fix it? Thanks!

KiddoZhu commented 1 year ago

Hi! The test suite is expected to run on GPU-enabled machines, but your modification is also right.

For the user warnings, they are from newer PyTorch versions to warn the user that some code in TorchDrug may rely on a deprecated operation. TorchDrug only uses floordiv for positive cases so it won't cause any bug. We will update the code to be more compatible with newer PyTorch operations.

gdeol4 commented 1 year ago

@KiddoZhu Hi there! I'm trying to run the example code from the documentation and I'm encountering the same error. My setup is on windows with a python=3.8.15 environment in conda with the cpu version of torch 1.12.1. I installed CPU versions of the dependencies as well.

The code I'm trying to run is the following:

ps. thank you in advance!

from torchdrug import datasets
from torchdrug import core, models, tasks
from torchdrug.layers import distribution
from torch import nn, optim

dataset = datasets.ZINC250k("~/molecule-datasets/", kekulize=True,
                            atom_feature="symbol")

model = models.RGCN(input_dim=dataset.num_atom_type,
                    num_relation=dataset.num_bond_type,
                    hidden_dims=[256, 256, 256], batch_norm=True)

num_atom_type = dataset.num_atom_type
# add one class for non-edge
num_bond_type = dataset.num_bond_type + 1

node_prior = distribution.IndependentGaussian(torch.zeros(num_atom_type),
                                              torch.ones(num_atom_type))
edge_prior = distribution.IndependentGaussian(torch.zeros(num_bond_type),
                                              torch.ones(num_bond_type))
node_flow = models.GraphAF(model, node_prior, num_layer=12)
edge_flow = models.GraphAF(model, edge_prior, use_edge=True, num_layer=12)

task = tasks.AutoregressiveGeneration(node_flow, edge_flow,
                                      max_node=38, max_edge_unroll=12,
                                      criterion="nll")

optimizer = optim.Adam(task.parameters(), lr = 1e-3)

solver = core.Engine(task, dataset, None, None, optimizer, gpus=None, batch_size=10)

solver.train(num_epoch=10)
solver.save("graphaf_WB3.pkl")

The error I get is is on line 31 solver = core.Engine(task, dataset, None, None, optimizer, gpus=None, batch_size=10)

(graphdf) C:\Users\gurka\Documents\graphdf\torchdrug>python test.py Loading C:\Users\gurka/molecule-datasets/250k_rndm_zinc_drugs_clean_3.csv: 50%|██████████████████████████████████████████▌ | 249456/498911 [00:04<00:04, 54268.24it/s] Constructing molecules from SMILES: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 249455/249455 [06:55<00:00, 600.81it/s] 14:12:02 Preprocess training set Traceback (most recent call last): File "test.py", line 31, in solver = core.Engine(task, dataset, None, None, optimizer, File "C:\Users\gurka\anaconda3\envs\graphdf\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), *kw) File "C:\Users\gurka\Documents\graphdf\torchdrug\torchdrug\core\core.py", line 288, in wrapper return init(self, args, **kwargs) File "C:\Users\gurka\Documents\graphdf\torchdrug\torchdrug\core\engine.py", line 101, in init task = task.cuda(self.device) File "C:\Users\gurka\anaconda3\envs\graphdf\lib\site-packages\torch\nn\modules\module.py", line 689, in cuda return self._apply(lambda t: t.cuda(device)) File "C:\Users\gurka\anaconda3\envs\graphdf\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply module._apply(fn) File "C:\Users\gurka\anaconda3\envs\graphdf\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply module._apply(fn) File "C:\Users\gurka\anaconda3\envs\graphdf\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply module._apply(fn) [Previous line repeated 2 more times] File "C:\Users\gurka\anaconda3\envs\graphdf\lib\site-packages\torch\nn\modules\module.py", line 602, in _apply param_applied = fn(param) File "C:\Users\gurka\anaconda3\envs\graphdf\lib\site-packages\torch\nn\modules\module.py", line 689, in return self._apply(lambda t: t.cuda(device)) File "C:\Users\gurka\anaconda3\envs\graphdf\lib\site-packages\torch\cuda__init__.py", line 211, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

I have tried to install the CUDA version in a separate environment and get different error on line 35 solver.train(num_epoch=10)

This is the error:

16:51:11 Epoch 0 begin C:\Users\gurka\Documents\graph\torchdrug\torchdrug\data\molecule.py:124: UserWarning: Try to apply masks on molecules with stereo bonds. This may produce invalid molecules. To discard stereo information, call mol.bond_stereo[:] = 0 before applying masks. warnings.warn("Try to apply masks on molecules with stereo bonds. This may produce invalid molecules. " Traceback (most recent call last): File "C:\Users\gurka\anaconda3\envs\graph\lib\site-packages\torch\utils\cpp_extension.py", line 1808, in _run_ninja_build subprocess.run( File "C:\Users\gurka\anaconda3\envs\graph\lib\subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "test.py", line 35, in solver.train(num_epoch=10)

I've also put the full text errors in files:

error_cpu_env.txt error_cuda_enabled.txt