cuda gpu device Error - Githubissues

parkjh688 commented 5 years ago

Hi.

I have 1 GPU in my computer but I got this error. I'm newbie of Pytorch so I don't know this Error's meaning.

Traceback (most recent call last):
  File "main.py", line 177, in <module>
    train_logger, train_batch_logger)
  File "/home/eden/Real-time-GesRec/train.py", line 34, in train_epoch
    outputs = model(inputs)
  File "/home/eden/anaconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eden/anaconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 146, in forward
    "them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

parkjh688 commented 5 years ago

I found the reason of error.

When I print t.device and self.src_device_obj in torch data_parallel.py file. I got cpu for t.device and cuda:0 for self.src_device_obj.

I guess the model made for CPU version. Can you tell me how to change CPU to GPU version?

ahmetgunduz commented 5 years ago

The models are made for GPU actually. Which version of torch are you using? Are you sure that you are using GPU? You can check it as in https://stackoverflow.com/a/48152675/6400484

parkjh688 commented 5 years ago

Yes I have and I checked it again by using that link.

ahmetgunduz commented 5 years ago

It seems to be a pytorch bug, Please check this solution https://discuss.pytorch.org/t/bug-in-dataparallel-only-works-if-the-dataset-device-is-cuda-0/28634/18

Karthik-Bhaskar commented 5 years ago

Hi,

@ahmetgunduz Even I am facing the same issue. Is there any solution for this ?

I checked if torch is able to detect the cuda device (1 GPU in my case), It seems good. I am using the torch version 1.2.

Screenshot from 2019-08-22 13-22-14

I am using the following config just to try out for the offline test on jester.

#!/bin/bash
python offline_test.py \
--root_path ~/ \
--video_path /home/karthik/Desktop/Data/Jester/20bn-jester-v1 \
--annotation_path Desktop/Project/Real-time-GesRec/annotation_Jester/jester.json \
--result_path Desktop/Project/Real-time-GesRec/results \
--resume_path Desktop/Project/Real-time-GesRec/pre-trained-models/jester_resnext_101_RGB_32.pth \
--dataset jester \
--sample_duration 32 \
--learning_rate 0.01 \
--model resnext \
--model_depth 101 \
--batch_size 1 \
--n_classes 27 \
--n_finetune_classes 27 \
--modality RGB \
--n_threads 8 \
--checkpoint 1 \
--train_crop random \
--n_val_samples 1 \
--test_subset val \
--n_epochs 100

@parkjh688 were you able to solve the issue ?

Thanks in advance.

ahmetgunduz commented 5 years ago

@Karthik-Bhaskar just to check can you please add --no_cuda parameter as well if it is working with cpu.

Karthik-Bhaskar commented 5 years ago

Should I need to add any value for --no_cuda parameter like True or False.

Or just include without any value like this,

#!/bin/bash
python offline_test.py \
--root_path ~/ \
--video_path /home/karthik/Desktop/Data/Jester/20bn-jester-v1 \
--annotation_path Desktop/Project/Real-time-GesRec/annotation_Jester/jester.json \
--result_path Desktop/Project/Real-time-GesRec/results \
--resume_path Desktop/Project/Real-time-GesRec/pre-trained-models/jester_resnext_101_RGB_32.pth \
--dataset jester \
--sample_duration 32 \
--learning_rate 0.01 \
--model resnext \
--model_depth 101 \
--batch_size 1 \
--n_classes 27 \
--n_finetune_classes 27 \
--modality RGB \
--n_threads 8 \
--checkpoint 1 \
--train_crop random \
--n_val_samples 1 \
--test_subset val \
--n_epochs 100 \
--no_cuda

I tried executing with the above parameters and ran into RuntimeError: Error(s) in loading state_dict for ResNeXt

Please tell me if it's the wrong way to add that parameter.

Thanks.

ahmetgunduz commented 5 years ago

Everything looks fine actually. The way you gave no_cuda parameter is right. Honestly, I have no clue about the error. It may be because of the torch version, the repo is lastly updated for PyTorch 1.0.1.post2 maybe you can downgrade your pytorch version and try.

Karthik-Bhaskar commented 5 years ago

I downgraded the PyTorch to 1.0.1.post2 but the issue remains the same. Can you please let me know if I need to use any particular version of the package or library. Currently, I am using Python 3.6 and Cuda 10.

ahmetgunduz commented 5 years ago

python 3.7.3 and Cuda 10 is the current versions I am using. See below:

ahmetgunduz commented 5 years ago

Dear @parkjh688 and @Karthik-Bhaskar, did you find any solution for this?

parkjh688 commented 5 years ago

@ahmetgunduz Unfortunately not yet. I will try to run this code with other machine which has another cuda and cudnn version next week to check this problem whether cuda problem or not. But I guess this looks like cuda version problem.

ahmetgunduz commented 5 years ago

@parkjh688 That is great! Looking forward to seeing the outcome...

xiaomingnio commented 4 years ago

model, parameters = generate_model(opt) model = model.cuda()

Add the sentence above.

MrXuf commented 4 years ago

@Karthik-Bhaskar were you able to solve the issue ? RuntimeError: Error(s) in loading state_dict for ResNeXt Thanks.

ahmetgunduz commented 4 years ago

the codebase is updated. Could you please pull the repo and recheck ?

Karthik-Bhaskar commented 4 years ago

@MrXuf No, I could not resolve it. Recheck with updated codebase as @ahmetgunduz told above.

MrXuf commented 4 years ago

Oh！Thank you for your email. I had the same problem and it bothered me for a few days. I will recheck latest code.

------------------ 原始邮件 ------------------ 发件人: "Karthik-Bhaskar"<notifications@github.com>; 发送时间: 2020年5月23日(星期六) 晚上10:48 收件人: "ahmetgunduz/Real-time-GesRec"<Real-time-GesRec@noreply.github.com>; 抄送: "Mr_Xuf_qq_mail"<2640503128@qq.com>;"Mention"<mention@noreply.github.com>; 主题: Re: [ahmetgunduz/Real-time-GesRec] cuda gpu device Error (#33)

@MrXuf No, I could not resolve it. Recheck with updated codebase as @ahmetgunduz told above.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ahmetgunduz / Real-time-GesRec

cuda gpu device Error #33