ethnhe / PVN3D

Code for "PVN3D: A Deep Point-wise 3D Keypoints Hough Voting Network for 6DoF Pose Estimation", CVPR 2020
MIT License
489 stars 105 forks source link

Error about Dataloader #8

Closed EasonEdison closed 4 years ago

EasonEdison commented 4 years ago

Because it reports ValueError: current limit exceeds maximum limit, so I change as follow:

rlimit = resource.getrlimit(resource.RLIMIT_NOFILE) # (4096,4096)
# resource.setrlimit(resource.RLIMIT_NOFILE, (30000, rlimit[1]))
resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))

then when run to for ibs, batch in enumerate(train_loader):, it reports:

Traceback (most recent call last):
  File "/home/ouc/anaconda3/envs/Kiruto/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data
    data = self.data_queue.get(timeout=timeout)
  File "/home/ouc/anaconda3/envs/Kiruto/lib/python3.6/multiprocessing/queues.py", line 104, in get
    if not self._poll(timeout):
  File "/home/ouc/anaconda3/envs/Kiruto/lib/python3.6/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/home/ouc/anaconda3/envs/Kiruto/lib/python3.6/multiprocessing/connection.py", line 414, in _poll
    r = wait([self], timeout)
  File "/home/ouc/anaconda3/envs/Kiruto/lib/python3.6/multiprocessing/connection.py", line 911, in wait
    ready = selector.select(timeout)
  File "/home/ouc/anaconda3/envs/Kiruto/lib/python3.6/selectors.py", line 376, in select
    fd_event_list = self._poll.poll(timeout)
  File "/home/ouc/anaconda3/envs/Kiruto/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 581) is killed by signal: Segmentation fault. 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ouc/TXH/IDEA_code/PVN3D/pvn3d/train/train_linemod_pvn3d.py", line 366, in train
    for ibs, batch in enumerate(train_loader):
  File "/home/ouc/anaconda3/envs/Kiruto/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 804, in __next__
    idx, data = self._get_data()
  File "/home/ouc/anaconda3/envs/Kiruto/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 771, in _get_data
    success, data = self._try_get_data()
  File "/home/ouc/anaconda3/envs/Kiruto/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 737, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 581) exited unexpectedly

I try to set less num_workers and min_batch_size, it not works when I set num_workers=0, it shows as folow and also not work.

epochs:   0%|          | 0/25 [00:00<?, ?it/s]
train:   0%|          | 0/5000 [00:00<?, ?it/s]{'bn_decay': 0.5,
 'bn_momentum': 0.9,
 'cal_metrics': False,
 'checkpoint': None,
 'cls': 'duck',
 'decay_step': 200000.0,
 'epochs': 1000,
 'eval_net': False,
 'lr': 0.01,
 'lr_decay': 0.5,
 'run_name': 'sem_seg_run_1',
 'test': False,
 'test_occ': False,
 'weight_decay': 0}

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

It is the error about my device or version of torch?

EasonEdison commented 4 years ago

My PCL have some errors, and I fix it.

welen-zhou commented 3 years ago

how you fix it , i have the same issue