error occur - Githubissues

Hi~ I'm really impressed with your code. I want to restore the hidden part of the face. So, you want to train with your code, but it is inferenced with a pretrained model. However, in the case of train, many problems arise. advice please

CUDA_VISIBLE_DEVICES=0,1 python -W ignore -m torch.distributed.launch --nproc_per_node 8 train_mae.py rank: 1 / 2 rank: 4 / 2 rank: 0 / 2 rank: 3 / 2 rank: 5 / 2 rank: 2 / 2 rank: 6 / 2 rank: 7 / 2 Traceback (most recent call last): File "train_mae.py", line 692, in main_worker(args) File "train_mae.py", line 205, in main_worker torch.cuda.set_device(args.local_rank) File "/home/vimlab/anaconda3/envs/stylegan2_pytorch/lib/python3.6/site-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal Traceback (most recent call last): File "train_mae.py", line 692, in main_worker(args) File "train_mae.py", line 205, in main_worker torch.cuda.set_device(args.local_rank) File "/home/vimlab/anaconda3/envs/stylegan2_pytorch/lib/python3.6/site-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal Traceback (most recent call last): File "train_mae.py", line 692, in main_worker(args) File "train_mae.py", line 205, in main_worker torch.cuda.set_device(args.local_rank) File "/home/vimlab/anaconda3/envs/stylegan2_pytorch/lib/python3.6/site-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal Traceback (most recent call last): Traceback (most recent call last): File "train_mae.py", line 692, in File "train_mae.py", line 692, in Traceback (most recent call last): File "train_mae.py", line 692, in main_worker(args) main_worker(args) File "train_mae.py", line 205, in main_worker File "train_mae.py", line 205, in main_worker torch.cuda.set_device(args.local_rank) File "/home/vimlab/anaconda3/envs/stylegan2_pytorch/lib/python3.6/site-packages/torch/cuda/init.py", line 261, in set_device torch.cuda.set_device(args.local_rank) File "/home/vimlab/anaconda3/envs/stylegan2_pytorch/lib/python3.6/site-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal main_worker(args) File "train_mae.py", line 205, in main_worker torch.cuda.set_device(args.local_rank) File "/home/vimlab/anaconda3/envs/stylegan2_pytorch/lib/python3.6/site-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal Killing subprocess 1520881 Killing subprocess 1520882 Killing subprocess 1520883 Killing subprocess 1520884 Killing subprocess 1520885 Killing subprocess 1520886 Killing subprocess 1520889 Killing subprocess 1520893 Traceback (most recent call last): File "/home/vimlab/anaconda3/envs/stylegan2_pytorch/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/vimlab/anaconda3/envs/stylegan2_pytorch/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/vimlab/anaconda3/envs/stylegan2_pytorch/lib/python3.6/site-packages/torch/distributed/launch.py", line 340, in main() File "/home/vimlab/anaconda3/envs/stylegan2_pytorch/lib/python3.6/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/home/vimlab/anaconda3/envs/stylegan2_pytorch/lib/python3.6/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/home/vimlab/anaconda3/envs/stylegan2_pytorch/bin/python', '-u', 'train_mae.py', '--local_rank=7']' returned non-zero exit status 1.

FlyEgle / MAE-pytorch

error occur #13