Cartus / AGGCN

Attention Guided Graph Convolutional Networks for Relation Extraction (authors' PyTorch implementation for the ACL19 paper)
MIT License
432 stars 88 forks source link

RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /tmp/pip-req-build-ufslq_a9/aten/src/THC/THCGeneral.cpp:50 #30

Closed ZohaibRamzan closed 3 years ago

ZohaibRamzan commented 3 years ago

(GPU_pytorch) hz071@hamilton:~/AGGCNN/AGGCN$ bash train_aggcn.sh 1 Vocab size 53953 loaded from file Loading data from dataset/tacred with batch size 50... 1363 batches created for dataset/tacred/train.json 453 batches created for dataset/tacred/dev.json Config saved to file ./saved_models/01/config.json Overwriting old vocab file at ./saved_models/01/vocab.pkl

Running with the following configs: data_dir : dataset/tacred vocab_dir : dataset/vocab emb_dim : 300 ner_dim : 30 pos_dim : 30 hidden_dim : 300 num_layers : 2 input_dropout : 0.5 gcn_dropout : 0.5 word_dropout : 0.04 topn : 10000000000.0 lower : False heads : 3 sublayer_first : 2 sublayer_second : 4 pooling : max pooling_l2 : 0.002 mlp_layers : 1 no_adj : False rnn : True rnn_hidden : 300 rnn_layers : 1 rnn_dropout : 0.5 lr : 0.7 lr_decay : 0.9 decay_epoch : 5 optim : sgd num_epoch : 100 batch_size : 50 max_grad_norm : 5.0 log_step : 20 log : logs.txt save_epoch : 100 save_dir : ./saved_models id : 1 info : seed : 0 cuda : False cpu : False load : False model_file : None num_class : 42 vocab_size : 53953 model_save_dir : ./saved_models/01

Finetune all embeddings. /home/hz071/.conda/envs/GPU_pytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py:50: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1 "num_layers={}".format(dropout, num_layers)) THCudaCheck FAIL file=/tmp/pip-req-build-ufslq_a9/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected Traceback (most recent call last): File "train.py", line 119, in trainer = GCNTrainer(opt, emb_matrix=emb_matrix) File "/home/hz071/AGGCNN/AGGCN/model/trainer.py", line 67, in init self.model = GCNClassifier(opt, emb_matrix=emb_matrix) File "/home/hz071/AGGCNN/AGGCN/model/aggcn.py", line 22, in init self.gcn_model = GCNRelationModel(opt, emb_matrix=emb_matrix) File "/home/hz071/AGGCNN/AGGCN/model/aggcn.py", line 47, in init self.gcn = AGGCN(opt, embeddings) File "/home/hz071/AGGCNN/AGGCN/model/aggcn.py", line 129, in init self.layers.append(GraphConvLayer(opt, self.mem_dim, self.sublayer_first)) File "/home/hz071/AGGCNN/AGGCN/model/aggcn.py", line 205, in init self.weight_list = self.weight_list.cuda() File "/home/hz071/.conda/envs/GPU_pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 304, in cuda return self._apply(lambda t: t.cuda(device)) File "/home/hz071/.conda/envs/GPU_pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 201, in _apply module._apply(fn) File "/home/hz071/.conda/envs/GPU_pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 223, in _apply param_applied = fn(param) File "/home/hz071/.conda/envs/GPU_pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 304, in return self._apply(lambda t: t.cuda(device)) File "/home/hz071/.conda/envs/GPU_pytorch/lib/python3.7/site-packages/torch/cuda/init.py", line 197, in _lazy_init torch._C._cuda_init() RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /tmp/pip-req-build-ufslq_a9/aten/src/THC/THCGeneral.cpp:50 (GPU_pytorch) hz071@hamilton:~/AGGCNN/AGGCN$ python Python 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.

import torch torch.cuda.is_available() True

I am using pytorch=1.4.0, python=3.7 and you can see above Cuda is available too. What should i do?

Cartus commented 3 years ago

It seems to me that the main error is due to no CUDA-capable device is detected, which is related to your running environment. It’s likely that you installed CUDA via the package manager method and did it incorrectly so that your driver install is incomplete. I do not know how to fix this issue as it is not related to our code.