feinanshan / M2M_VFI

Many-to-many Splatting for Efficient Video Frame Interpolation
Other
79 stars 5 forks source link

[Training Error] illegal memory access in DDP training. #12

Closed NK-CS-ZZL closed 1 year ago

NK-CS-ZZL commented 1 year ago

Congratulations on your awesome work. The 'softsplat' func works well under a single GPU setting. However, we encounter an error in the DDP training with more GPUs. So did you tried to train the network with more than one GPU, or the 'softsplat' func cannot support this. Here is the error info and our environment is RTX 3090/cupy11.6/pytorch11.1/cudatoolkit11.3 (Though cuda10.0/pytorc1.8.0 are recommended in README, they are incompatible with RTX 3090).

File "/home/lele/code/zzl/VFI-Exp_new/networks/blocks/softsplat.py", line 251, in softsplat
    tenOut = softsplat_func.apply(tenIn, tenFlow)
File "/home/lele/anaconda3/envs/vfi-conda/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 118, in decorate_fwd
    return fwd(*args, **kwargs)
File "/home/lele/code/zzl/VFI-Exp_new/networks/blocks/softsplat.py", line 284, in forward
    cuda_launch(cuda_kernel('softsplat_out', '''
File "cupy/_util.pyx", line 67, in cupy._util.memoize.decorator.ret
File "/home/lele/code/zzl/VFI-Exp_new/networks/blocks/softsplat.py", line 223, in cuda_launch
    return cupy.cuda.compile_with_cache(objCudacache[strKey]['strKernel'], 
File "/home/lele/anaconda3/envs/vfi-conda/lib/python3.10/site-packages/cupy/cuda/compiler.py", line 468, in compile_with_cache
    return _compile_module_with_cache(*args, **kwargs)
File "/home/lele/anaconda3/envs/vfi-conda/lib/python3.10/site-packages/cupy/cuda/compiler.py", line 496, in _compile_module_with_cache
    return _compile_with_cache_cuda(    
File "/home/lele/anaconda3/envs/vfi-conda/lib/python3.10/site-packages/cupy/cuda/compiler.py", line 565, in _compile_with_cache_cuda
    mod.load(cubin)
File "cupy/cuda/function.pyx", line 264, in cupy.cuda.function.Module.load
File "cupy/cuda/function.pyx", line 266, in cupy.cuda.function.Module.load
File "cupy_backends/cuda/api/driver.pyx", line 210, in cupy_backends.cuda.api.driver.moduleLoadData
File "cupy_backends/cuda/api/driver.pyx", line 60, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
sniklaus commented 1 year ago

Haven't noticed any issues myself, how many GPUs are you using and what is nproc_per_node for the DDP training?

NK-CS-ZZL commented 1 year ago

Thanks for your reply. My pc has two RTX 3090s and nproc_per_node is 2.

NK-CS-ZZL commented 1 year ago

BTW, we call this func softsplat(tenIn, tenFlow, None, 'avg') as the same in softmax-splatting. I think at least I use it in the right way, because I used this code get good inference results from M2MVFI and SoftSplat. However, it doesn't work on multiple GPUs. Do we need to guarantee the tensor is contiguous before we conduct forward warping? (Our tenIn is not contiguous, maybe this causes the problem?)

sniklaus commented 1 year ago

I am afraid that I am not sure why it would fail, if you are worried about the tensors being non-contiguous you can always pass them as tensor.contiguous() to the softmax splatting function. Maybe the input tensor and the flow tensor are on different CUDA devices? As a hack you could run torchrun twice, once with CUDA_VISIBLE_DEVICES set to 0 and once to 1 such that you effectively have two training runs on the same machine in parallel.

NK-CS-ZZL commented 1 year ago

Thanks for your patience and I'll try these suggestions. BTW, forward warping is an interesting idea. XD

NK-CS-ZZL commented 1 year ago

I solve this bug by setting torch.cuda.set_device manually. If not, torch.cuda.current_stream() would return a stream in the wrong device (always be 'cuda:0').

sniklaus commented 1 year ago

Thanks for sharing your finding! Good to know about that behavior of current_stream(), I wasn't aware of it. And good luck with your work/research!