Closed zxy630 closed 10 months ago
My environment: PyTorch=1.3.1, torchvision=0.4.2, cuda=10.1.243, test in 3090 with 4 GPUs.
Hi, It seems a cuda error, maybe caused by corrupted pytorch environment. you can try run a single conv module to check if the environment is in good condition. Reinstall pytorch may solve this, if thats the case.
I have tried torch=1.3.1, 1.5.0, 1.7.1, 1.8.0 and still existed problems like this case. I wonder which version you test, incluing torch, CUDA, GPU if convenient. Thanks.
The traceback you provided shows that torch cant run conv module sucessfully. So try run single conv module to see if torch works, just like this: import torch conv = torch.nn.Conv2d(4, 16, 3).cuda() x = torch.rand(2, 4, 128, 128) .cuda()# B,C,W,H y = conv(x)
Excuse.
eval is well done, but when i train, it happened error.
'''
Traceback (most recent call last):
File "./deeplesion/train_dist.py", line 121, in
I have tried a lot of cv versions but doesn't work. Can you give me some tips?
Checking the albumentations version, and using compatible opencv.
I try to use ‘./deeplesion/eval.sh ./deeplesion/mconfigs/densenet_a3d.py ./deeplesion/model_weights/adap_7slice_weigts.pth’ but I get this wrong information. It's been bothering me for days......
Here is the info ''' ./deeplesion/mconfigs/densenet_a3d.py a3d 7 slice [ ] 0/160, elapsed: 0s, ETA:Traceback (most recent call last): File "./deeplesion/eval.py", line 210, in
main(checkpoint, cfg_path)
File "./deeplesion/eval.py", line 196, in main
outputs = single_gpu_test(model, dl)
File "./deeplesion/eval.py", line 101, in single_gpu_test
r = model(return_loss=False, rescale=False, data)
File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, *kwargs) File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(inputs[0], kwargs[0]) File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, kwargs) File "/disk/user/zxy/project/AlignShift/mmdet/core/fp16/decorators.py", line 49, in new_func return old_func(*args, kwargs) File "/disk/user/zxy/project/AlignShift/mmdet/models/detectors/base.py", line 122, in forward return self.forward_test(img, img_meta, kwargs) File "/disk/user/zxy/project/AlignShift/mmdet/models/detectors/base.py", line 105, in forward_test return self.simple_test(imgs, img_metas, *kwargs) File "/disk/user/zxy/project/AlignShift/mmdet/models/detectors/two_stage.py", line 268, in simple_test x = self.extract_feat(img) File "/disk/user/zxy/project/AlignShift/mmdet/models/detectors/two_stage.py", line 92, in extract_feat x = self.backbone(img) File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, kwargs) File "/disk/user/zxy/project/AlignShift/nn/models/truncated_densenet3d_a3d.py", line 168, in forward x = self.conv0(x) File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/disk/user/zxy/project/AlignShift/nn/operators/a3dconv.py", line 59, in forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR '''
Hope your suggestions, thanks so much.