DeepGraphLearning / graphvite

GraphVite: A General and High-performance Graph Embedding System
https://graphvite.io
Apache License 2.0
1.22k stars 151 forks source link

CUDA Version Error - "Torch not compiled with CUDA" #33

Closed williamcaruso closed 4 years ago

williamcaruso commented 4 years ago

I have successfully installed graphvite using the conda install. However, when I run the baseline quick start, I am able to Train successfully and then the program crashed on Link Prediction

Does anyone know what the issue could be?

When I installed graphite, the condatoolkit was downgraded to from 10.1 to 10.0.103, and I have assured the drivers are compatible:

Here is the output:

~$ graphvite baseline quick start
running baseline: demo/quick_start.yaml
loading graph from /home/williamcaruso/.graphvite/dataset/blogcatalog/blogcatalog_train.txt
0.00018755%
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Graph<uint32>
------------------ Graph -------------------
#vertex: 10308, #edge: 327429
as undirected: yes, normalization: no
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[time] GraphApplication.load: 0.218091 s
#CPU threads is beyond the hardware concurrency
[time] GraphApplication.build: 1.95901 s
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
GraphSolver<128, float32, uint32>
----------------- Resource -----------------
#worker: 1, #sampler: 7, #partition: 1
tied weights: no, episode size: 500
gpu memory limit: 15.3 GiB
gpu memory cost: 51.5 MiB
----------------- Sampling -----------------
augmentation step: 2, shuffle base: 2
random walk length: 40
random walk batch size: 100
#negative: 1, negative sample exponent: 0.75
----------------- Training -----------------
model: LINE
optimizer: SGD
learning rate: 0.025, lr schedule: linear
weight decay: 0.005
#epoch: 2000, batch size: 100000
resume: no
positive reuse: 1, negative weight: 5
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Batch id: 0 / 6548
loss = 0
Batch id: 1000 / 6548
loss = 0.388431
Batch id: 2000 / 6548
loss = 0.383468
Batch id: 3000 / 6548
loss = 0.379309
Batch id: 4000 / 6548
loss = 0.375848
Batch id: 5000 / 6548
loss = 0.37298
Batch id: 6000 / 6548
loss = 0.371091
[time] GraphApplication.train: 22.2748 s
------------- link prediction --------------
effective edges: 6644 / 6650
effective filter edges: 327429 / 327429
remaining edges: 6644 / 6644
Traceback (most recent call last):
  File "/opt/anaconda3/bin/graphvite", line 11, in <module>
    load_entry_point('graphvite==0.2.1', 'console_scripts', 'graphvite')()
  File "/opt/anaconda3/lib/python3.7/site-packages/graphvite/cmd.py", line 272, in main
    command[args.command](args)
  File "/opt/anaconda3/lib/python3.7/site-packages/graphvite/cmd.py", line 234, in baseline_main
    app.evaluate(**evaluation)
  File "/opt/anaconda3/lib/python3.7/site-packages/graphvite/util.py", line 155, in wrapper
    result = function(*args, **kwargs)
  File "/opt/anaconda3/lib/python3.7/site-packages/graphvite/application/application.py", line 124, in evaluate
    result = getattr(self, func_name)(**kwargs)
  File "/opt/anaconda3/lib/python3.7/site-packages/graphvite/application/application.py", line 436, in link_prediction
    model = model.cuda()
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 305, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 202, in _apply
    module._apply(fn)
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 202, in _apply
    module._apply(fn)
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 224, in _apply
    param_applied = fn(param)
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 305, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/cuda/__init__.py", line 192, in _lazy_init
    _check_driver()
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/cuda/__init__.py", line 95, in _check_driver
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
~$ nvidia-smi
Tue Dec  3 13:13:24 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P0    38W / 300W |      0MiB / 16130MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

~$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

~$ cat /usr/local/cuda/version.txt
CUDA Version 10.0.130
import torch
torch.cuda.is_available() ==> False
torch.backends.cudnn.enabled ==> True
KiddoZhu commented 4 years ago

It looks like the PyTorch conda automatically installed is a CPU version.

I just checked PyTorch's website. It only recommends installation for CUDA 9.2/10.1. Maybe this is the reason why CUDA 10.0 falls to CPU version. Could you try the following line?

conda install graphvite cudatoolkit=10.1

This will enforce to install GraphVite for CUDA 10.1.

KiddoZhu commented 4 years ago

Also the Colab version, which uses py36 and CUDA 10.0, works fine. It is installed by

conda install -c milagraph -c conda-forge graphvite python=3.6 cudatoolkit=10.0

So at least there is CUDA-enabled PyTorch in conda-forge.