When I set --num_workers to 0,it reports as follows:
./scripts/train_stanford.sh: line 34: 30654 Segmentation fault python3 -m main --dataset StanfordArea5Dataset --batch_size $BATCH_SIZE --scheduler PolyLR --model Res16UNet34 --conv1_kernel_size 5 --log_dir $LOG_DIR --lr 1e-1 --max_iter 60000 --data_aug_color_trans_ratio 0.05 --data_aug_color_jitter_std 0.005 $3 2>&1
30655 Done | tee -a "$LOG"
When I set --num_workers to 1,it reports as follows:
yq01-qianmo-com-127-2-22 12/29 14:21:28 ===> Start testing
ERROR: Unexpected segmentation fault encountered in worker.
Traceback (most recent call last):
File "/miniconda3/envs/py3-mink/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 990, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/miniconda3/envs/py3-mink/lib/python3.7/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/miniconda3/envs/py3-mink/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 289, in rebuild_storage_fd
fd = df.detach()
File "/miniconda3/envs/py3-mink/lib/python3.7/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/miniconda3/envs/py3-mink/lib/python3.7/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/miniconda3/envs/py3-mink/lib/python3.7/multiprocessing/connection.py", line 492, in Client
c = SocketClient(address)
File "/miniconda3/envs/py3-mink/lib/python3.7/multiprocessing/connection.py", line 620, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/miniconda3/envs/py3-mink/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/miniconda3/envs/py3-mink/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/spatiotemporalsegmentation/main.py", line 162, in <module>
main()
File "/spatiotemporalsegmentation/main.py", line 157, in main
test(model, test_data_loader, config)
File "/spatiotemporalsegmentation/lib/test.py", line 98, in test
coords, input, target = data_iter.next()
File "/miniconda3/envs/py3-mink/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/miniconda3/envs/py3-mink/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1186, in _next_data
idx, data = self._get_data()
File "/miniconda3/envs/py3-mink/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1152, in _get_data
success, data = self._try_get_data()
File "/miniconda3/envs/py3-mink/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1003, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 5202) exited unexpectedly
I tried it on two computers. And I have tried with different versions,
cuda 11.1
MinkowskiEngine 0.5.4
pytorch 1.9.0
or
cuda 10.2
MinkowskiEngine 0.4.3
pytorch 1.5.0 or 1.7.1 or 1.9.0 or 1.10.2
Could you please tell me which version of MinkowskiEngine I should use?
I also tested step by step and found that the problem occurred in 96 line of lib/test.py/:
coords, input, target = data_iter.next()
I have been troubled by this problem for several days. Could you please provide me with some ideas to solve this problem?
I encountered this problem when trying to run
When I set --num_workers to 0,it reports as follows:
When I set --num_workers to 1,it reports as follows:
I tried it on two computers. And I have tried with different versions,
or
Could you please tell me which version of MinkowskiEngine I should use?
I also tested step by step and found that the problem occurred in 96 line of lib/test.py/:
I have been troubled by this problem for several days. Could you please provide me with some ideas to solve this problem?