关于cuda和pytorch版本问题

lllianghe commented 1 year ago

当我运行这行命令时，python train.py --cfg configs/monohuman/zju_mocap/xxx/xxx.yaml resume False 出现了如下报错信息： ** Init Trainer *** /home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/cuda/init.py:104: UserWarning: NVIDIA GeForce RTX 3090 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37. If you want to use the NVIDIA GeForce RTX 3090 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) Save checkpoint to experiments/monohuman/zju_mocap/p386/suject_386/init.tar ... Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Loading model from: /home/zzq/lhe/monohuman/MonoHuman/third_parties/lpips/weights/v0.1/vgg.pth Load Progress Dataset ... [Dataset Path] /home/zzq/lhe/monohuman/MonoHuman/dataset/zju_mocap/386 test--movement set-- -- Total Frames: 14

[Dataset Path] /home/zzq/lhe/monohuman/MonoHuman/dataset/zju_mocap/386 test--movement set-- -- Total Frames: 432 Traceback (most recent call last): File "train.py", line 37, in main() File "train.py", line 31, in main train_dataloader=train_loader) File "core/train/trainers/monohuman/trainer.py", line 177, in train net_output = self.network(data) File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _callimpl result = self.forward(*input, **kwargs) File "core/nets/monohuman/network.py", line 556, in forward featmaps, = self.feature_extractor(src_imgs.permute(0, 3, 1, 2)) File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "core/nets/monohuman/feature_extract/feature_extractor.py", line 247, in forward x = self.relu(self.bn1(self.conv1(x))) File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/instancenorm.py", line 57, in forward self.training or not self.track_running_stats, self.momentum, self.eps) File "/home/zzq/miniconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/functional.py", line 2080, in instance_norm use_input_stats, momentum, eps, torch.backends.cudnn.enabled RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED 我怀疑是cuda和pytorch版本的问题。我使用的环境中cuda版本为11.7，pytorch版本是1.7.1 请问下您配置的cuda及Pytorch版本是多少

Yzmblog commented 1 year ago

你好，我们用pytorch1.7.1+cuda10.1. Pytorch 1.7.1可能并不支持cuda11.7. 重新装下环境应该就可以了。

staymylove commented 1 year ago

experiments/monohuman/zju_mocap/p377/suject_377/latest.tar /data/zeju/anaconda3/envs/Monohuman/lib/python3.7/site-packages/torch/cuda/init.py:104: UserWarning: NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37. If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) load network from experiments/monohuman/zju_mocap/p377/suject_377/latest.tar [Dataset Path] dataset/zju_mocap/377 test--movement set-- -- Total Frames: 456 The rendering is saved in experiments/monohuman/zju_mocap/p377/suject377/latest/movement 0%| | 0/456 [00:01<?, ?it/s] Traceback (most recent call last): File "run.py", line 259, in globals()[f'run{args.type}']() File "run.py", line 153, in run_movement net_output = model(data, iter_val=cfg.eval_iter) File "/data/zeju/anaconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _callimpl result = self.forward(*input, **kwargs) File "core/nets/monohuman/network.py", line 556, in forward featmaps, = self.feature_extractor(src_imgs.permute(0, 3, 1, 2)) File "/data/zeju/anaconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "core/nets/monohuman/feature_extract/feature_extractor.py", line 247, in forward x = self.relu(self.bn1(self.conv1(x))) File "/data/zeju/anaconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/data/zeju/anaconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/modules/instancenorm.py", line 57, in forward self.training or not self.track_running_stats, self.momentum, self.eps) File "/data/zeju/anaconda3/envs/Monohuman/lib/python3.7/site-packages/torch/nn/functional.py", line 2080, in instance_norm use_input_stats, momentum, eps, torch.backends.cudnn.enabled RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

请问这个该怎么处理，环境是pytorch1.7.1+cuda10.1，GPU是3090

Yzmblog / MonoHuman

关于cuda和pytorch版本问题 #10