ethnhe / FFB6D

[CVPR2021 Oral] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation.
MIT License
288 stars 72 forks source link

Training not working on ycb dataset #59

Open SomalRudra opened 2 years ago

SomalRudra commented 2 years ago

When I run ./train_ycb.sh

contents of train_ycb.sh

!/bin/bash

n_gpu=1 # number of gpu to use python3 -m torch.distributed.launch --nproc_per_node=$n_gpu train_ycb.py --gpus=$n_gpu

I get the following error

ERROR: Unexpected segmentation fault encountered in worker. | 4/22265 [00:18<21:20:20, 3.45s/it, total_it=4] epochs: 0%| | 0/25 [00:20<?, ?it/s] Traceback (most recent call last): File "train_ycb.py", line 665, in <module> train() File "train_ycb.py", line 656, in train clr_div=clr_div File "train_ycb.py", line 465, in train scaled_loss.backward() File "/home/system1-user1/anaconda3/envs/BF3D_env/lib/python3.6/site-packages/torch/tensor.py", line 195, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/system1-user1/anaconda3/envs/BF3D_env/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward allow_unreachable=True) # allow_unreachable flag File "/home/system1-user1/anaconda3/envs/BF3D_env/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 261624) is killed by signal: Segmentation fault. Traceback (most recent call last): File "/home/system1-user1/anaconda3/envs/BF3D_env/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/system1-user1/anaconda3/envs/BF3D_env/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/system1-user1/anaconda3/envs/BF3D_env/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in <module> main() File "/home/system1-user1/anaconda3/envs/BF3D_env/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/system1-user1/anaconda3/envs/BF3D_env/bin/python3', '-u', 'train_ycb.py', '--local_rank=0', '--gpus=1']' returned non-zero exit status 1.

katrina992730 commented 3 months ago

I have the same issue, can you get it solved ?