XinyiYing / D3Dnet

Repository for "Deformable 3D Convolution for Video Super-Resolution", SPL, 2020
Apache License 2.0
305 stars 43 forks source link

分布式训练 #33

Open weiMytian opened 1 year ago

weiMytian commented 1 year ago

万分感谢您分享的代码,不知道您是否使用过多卡对程序进行训练,我将3D可变形卷积用在自己的任务上,当我使用单卡训练时程序可以正常运行,但是使用多卡运行时程序报了如下错误,始终没有解决该问题: error in deformable_col2im_cuda: an illegal memory access was encountered error in deformable_im2col_cuda: an illegal memory access was encountered Traceback (most recent call last): File "train.py", line 598, in main() File "train.py", line 396, in main loss.backward() # cal grad File "/home/jp/anaconda3/envs/torch/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/jp/anaconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward Variable._execution_engine.run_backward( File "/home/jp/anaconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/function.py", line 199, in apply return user_fn(self, args) File "/home/jp/anaconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/function.py", line 340, in wrapper outputs = fn(ctx, args) File "/MIP/zhang/3d_code/dcn/functions/deform_conv_func.py", line 46, in backward D3D.deform_conv_backward(input, weight, RuntimeError: CUDA error: an illegal memory access was encountered 不知您是否遇到过同样的问题,期待您的回复!

fzs347 commented 1 year ago

万分感谢您分享的代码,不知道您是否使用过多卡对程序进行训练,我将3D可变形卷积用在自己的任务上,当我使用单卡训练时程序可以正常运行,但是使用多卡运行时程序报了如下错误,始终没有解决该问题: error in deformable_col2im_cuda: an illegal memory access was encountered error in deformable_im2col_cuda: an illegal memory access was encountered Traceback (most recent call last): File "train.py", line 598, in main() File "train.py", line 396, in main loss.backward() # cal grad File "/home/jp/anaconda3/envs/torch/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/jp/anaconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward Variable._execution_engine.run_backward( File "/home/jp/anaconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/function.py", line 199, in apply return user_fn(self, args) File "/home/jp/anaconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/function.py", line 340, in wrapper outputs = fn(ctx, args) File "/MIP/zhang/3d_code/dcn/functions/deform_conv_func.py", line 46, in backward D3D.deform_conv_backward(input, weight, RuntimeError: CUDA error: an illegal memory access was encountered 不知您是否遇到过同样的问题,期待您的回复!

请问您是否解决了这个问题?我也遭遇了同样问题。