Closed ZohaibRamzan closed 3 years ago
It seems to me that the main error is due to no CUDA-capable device is detected, which is related to your running environment. It’s likely that you installed CUDA via the package manager method and did it incorrectly so that your driver install is incomplete. I do not know how to fix this issue as it is not related to our code.
(GPU_pytorch) hz071@hamilton:~/AGGCNN/AGGCN$ bash train_aggcn.sh 1 Vocab size 53953 loaded from file Loading data from dataset/tacred with batch size 50... 1363 batches created for dataset/tacred/train.json 453 batches created for dataset/tacred/dev.json Config saved to file ./saved_models/01/config.json Overwriting old vocab file at ./saved_models/01/vocab.pkl
Running with the following configs: data_dir : dataset/tacred vocab_dir : dataset/vocab emb_dim : 300 ner_dim : 30 pos_dim : 30 hidden_dim : 300 num_layers : 2 input_dropout : 0.5 gcn_dropout : 0.5 word_dropout : 0.04 topn : 10000000000.0 lower : False heads : 3 sublayer_first : 2 sublayer_second : 4 pooling : max pooling_l2 : 0.002 mlp_layers : 1 no_adj : False rnn : True rnn_hidden : 300 rnn_layers : 1 rnn_dropout : 0.5 lr : 0.7 lr_decay : 0.9 decay_epoch : 5 optim : sgd num_epoch : 100 batch_size : 50 max_grad_norm : 5.0 log_step : 20 log : logs.txt save_epoch : 100 save_dir : ./saved_models id : 1 info : seed : 0 cuda : False cpu : False load : False model_file : None num_class : 42 vocab_size : 53953 model_save_dir : ./saved_models/01
Finetune all embeddings. /home/hz071/.conda/envs/GPU_pytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py:50: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1 "num_layers={}".format(dropout, num_layers)) THCudaCheck FAIL file=/tmp/pip-req-build-ufslq_a9/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected Traceback (most recent call last): File "train.py", line 119, in
trainer = GCNTrainer(opt, emb_matrix=emb_matrix)
File "/home/hz071/AGGCNN/AGGCN/model/trainer.py", line 67, in init
self.model = GCNClassifier(opt, emb_matrix=emb_matrix)
File "/home/hz071/AGGCNN/AGGCN/model/aggcn.py", line 22, in init
self.gcn_model = GCNRelationModel(opt, emb_matrix=emb_matrix)
File "/home/hz071/AGGCNN/AGGCN/model/aggcn.py", line 47, in init
self.gcn = AGGCN(opt, embeddings)
File "/home/hz071/AGGCNN/AGGCN/model/aggcn.py", line 129, in init
self.layers.append(GraphConvLayer(opt, self.mem_dim, self.sublayer_first))
File "/home/hz071/AGGCNN/AGGCN/model/aggcn.py", line 205, in init
self.weight_list = self.weight_list.cuda()
File "/home/hz071/.conda/envs/GPU_pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 304, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/hz071/.conda/envs/GPU_pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/hz071/.conda/envs/GPU_pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 223, in _apply
param_applied = fn(param)
File "/home/hz071/.conda/envs/GPU_pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 304, in
return self._apply(lambda t: t.cuda(device))
File "/home/hz071/.conda/envs/GPU_pytorch/lib/python3.7/site-packages/torch/cuda/init.py", line 197, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /tmp/pip-req-build-ufslq_a9/aten/src/THC/THCGeneral.cpp:50
(GPU_pytorch) hz071@hamilton:~/AGGCNN/AGGCN$ python
Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
I am using pytorch=1.4.0, python=3.7 and you can see above Cuda is available too. What should i do?