CUDA error: an illegal memory access was encountered: on RTX3090 (using multiple GPUs)

The error occurs when running it on RTX3090 multiple GPUs or only GPU[1].

To Reproduce

Steps to reproduce the behavior:

Run bellow python swap.py ../docs/examples/shinzo_abe.mp4 -t ../docs/examples/conan_obrien.mp4 -o . --finetune --finetune_save --seg_remove_mouth or python swap.py ../docs/examples/shinzo_abe.mp4 -t ../docs/examples/conan_obrien.mp4 -o . --finetune --finetune_save --seg_remove_mouth --gpu 1

Expected behavior

Error Message:

C:\Users\kao-prototype\Documents\FSGAN\data\dev\projects\fsgan\inference> python swap.py ../docs/examples/shinzo_abe.mp4 -t ../docs/examples/conan_obrien.mp4 -o . --finetune --finetune_save --seg_remove_mouth
=> using GPU devices: 0, 1
=> Loading face pose model: "hopenet_robust_alpha1.pth"...
=> Loading face landmarks model: "hr18_wflw_landmarks.pth"...
=> Loading face segmentation model: "celeba_unet_256_1_2_segmentation_v2.pth"...
=> Loading face reenactment model: "nfv_msrunet_256_1_2_reenactment_v2.1.pth"...
=> Loading face completion model: "ijbc_msrunet_256_1_2_inpainting_v2.pth"...
=> Loading face blending model: "ijbc_msrunet_256_1_2_blending_v2.pth"...
=> Detecting faces in video: "shinzo_abe.mp4..."
  1%|▉                                                                             | 7/600 [00:02<02:58,  3.32frames/s]
Traceback (most recent call last):
  File "swap.py", line 504, in <module>
    main(**vars(parser.parse_args()))
  File "swap.py", line 498, in main
    face_swapping(source[0], target[0], output, select_source, select_target)
  File "swap.py", line 239, in __call__
    source_cache_dir, source_seq_file_path, _ = self.cache(source_path)
  File "C:\Users\kao-prototype\Documents\FSGAN\data\dev\projects\fsgan\preprocess\preprocess_video.py", line 446, in cache
    self.face_detector(input_path, det_file_path)
  File "C:\Users\kao-prototype\Documents\FSGAN\data\dev\projects\face_detection_dsfd\face_detector.py", line 92, in __call__
    detections_batch = self.net(frame_tensor_batch)
  File "C:\Users\kao-prototype\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\kao-prototype\anaconda3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 160, in forward
    replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
  File "C:\Users\kao-prototype\anaconda3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 165, in replicate
    return replicate(module, device_ids, not torch.is_grad_enabled())
  File "C:\Users\kao-prototype\anaconda3\lib\site-packages\torch\nn\parallel\replicate.py", line 88, in replicate
    param_copies = _broadcast_coalesced_reshape(params, devices, detach)
  File "C:\Users\kao-prototype\anaconda3\lib\site-packages\torch\nn\parallel\replicate.py", line 67, in _broadcast_coalesced_reshape
    return comm.broadcast_coalesced(tensors, devices)
  File "C:\Users\kao-prototype\anaconda3\lib\site-packages\torch\nn\parallel\comm.py", line 56, in broadcast_coalesced
    return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: CUDA error: an illegal memory access was encountered

Environment

PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 11.0
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce RTX 3090
GPU 1: GeForce RTX 3090

Nvidia driver version: 461.40
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn_ops_train64_8.dll
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] numpydoc==1.1.0
[pip3] torch==1.7.1
[pip3] torchaudio==0.7.2
[pip3] torchvision==0.8.2
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.0.221             h74a9793_0
[conda] mkl                       2020.2                      256
[conda] mkl-service               2.3.0            py38hb782905_0
[conda] mkl_fft                   1.2.0            py38h45dec08_0
[conda] mkl_random                1.1.1            py38h47e9c7a_0
[conda] numpy                     1.19.2           py38hadc3359_0
[conda] numpy-base                1.19.2           py38ha3acd2a_0
[conda] numpydoc                  1.1.0              pyhd3eb1b0_1
[conda] pytorch                   1.7.1           py3.8_cuda110_cudnn8_0    pytorch
[conda] torchaudio                0.7.2                      py38    pytorch
[conda] torchvision               0.8.2                py38_cu110    pytorch

Additional context

It works fine when running on a single GPU. python swap.py ../docs/examples/shinzo_abe.mp4 -t ../docs/examples/conan_obrien.mp4 -o . --finetune --finetune_save --seg_remove_mouth --gpu 0

As a matter of fact, The program was working fine on multiple GPU until recently in the following environment.

PyTorch version: Uncertain ( conda install pytorch torchvision cudatoolkit=11 -c pytorch-nightly when I run on Nov 27th, 2020)
CUDA Version: 11.1
cuDNN Version: 8.04
Nvidia driver version: 457.30

However, it suddenly stopped working. (Is the cause a reboot of the PC or is it a problem with the GPU memory?) I guessed that the problem was similar to this post, but I don't know the specific solution.

After that, I took the following actions, but it did not improve.

update to pytorch1.7.1 using conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
reinstall CUDA (11.0), cuDNN (8.05 for CUDA11.0)
reinstall CUDA (11.2), cuDNN (8.05 for CUDA11.2) and Nvidia driver (461.40)

When all the actions were taken, I reinstalled Anaconda and rebooted the PC.

I also thought it might be a hardware problem, but I think it is unlikely since another program ( e.g. DeepFaceLab ) worked on multiple GPUs.

nvidia-smi -l output :

 C:\Users\kao-prototype\Documents\FSGAN\data\dev\projects\fsgan\inference> nvidia-smi -l
Tue Feb  2 20:01:25 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 461.40       Driver Version: 461.40       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090   WDDM  | 00000000:18:00.0  On |                  N/A |
| 30%   28C    P8    19W / 350W |    847MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 3090   WDDM  | 00000000:3B:00.0  On |                  N/A |
| 30%   26C    P8    11W / 350W |    387MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1804    C+G   Insufficient Permissions        N/A      |
|    0   N/A  N/A      4456    C+G   ...lPanel\SystemSettings.exe    N/A      |
|    0   N/A  N/A      4460    C+G   ...b3d8bbwe\WinStore.App.exe    N/A      |
|    0   N/A  N/A      4512    C+G   ...kyb3d8bbwe\Calculator.exe    N/A      |
|    0   N/A  N/A      5080    C+G   Insufficient Permissions        N/A      |
|    0   N/A  N/A      8036    C+G   ...bbwe\Microsoft.Photos.exe    N/A      |
|    0   N/A  N/A      8464    C+G   C:\Windows\explorer.exe         N/A      |
|    0   N/A  N/A      9212    C+G   ...artMenuExperienceHost.exe    N/A      |
|    0   N/A  N/A      9252    C+G   ...me\Application\chrome.exe    N/A      |
|    0   N/A  N/A      9912    C+G   ...5n1h2txyewy\SearchApp.exe    N/A      |
|    0   N/A  N/A     10444    C+G   ...ekyb3d8bbwe\YourPhone.exe    N/A      |
|    0   N/A  N/A     10648    C+G   ...wekyb3d8bbwe\Video.UI.exe    N/A      |
|    0   N/A  N/A     10932    C+G   ...nputApp\TextInputHost.exe    N/A      |
|    0   N/A  N/A     11556    C+G   ...\app-3.3.9\SourceTree.exe    N/A      |
|    0   N/A  N/A     12152    C+G   ...8wekyb3d8bbwe\Cortana.exe    N/A      |
|    0   N/A  N/A     12560    C+G   ...ekyb3d8bbwe\HxOutlook.exe    N/A      |
|    0   N/A  N/A     13124    C+G   ...kyb3d8bbwe\HxAccounts.exe    N/A      |
|    0   N/A  N/A     14392    C+G   ...y\ShellExperienceHost.exe    N/A      |
|    1   N/A  N/A      1804    C+G   Insufficient Permissions        N/A      |
|    1   N/A  N/A     11556    C+G   ...\app-3.3.9\SourceTree.exe    N/A      |
+-----------------------------------------------------------------------------+

YuvalNirkin / fsgan