Opening arch config file config/arch/squeezeseg.yaml
Opening data config file config/labels/semantic-kitti.yaml
No pretrained directory found.
Copying files to /bonnet/lidar-bonnetal/logs/ for further reference.
Sequences folder exists! Using sequences from /bonnet/KITTI/sequences
parsing seq 00
parsing seq 01
parsing seq 02
parsing seq 03
parsing seq 04
parsing seq 05
parsing seq 06
parsing seq 07
parsing seq 09
parsing seq 10
Using 2761 scans from sequences [0, 1, 2, 3, 4, 5, 6, 7, 9, 10]
Sequences folder exists! Using sequences from /bonnet/KITTI/sequences
parsing seq 05
Using 2761 scans from sequences [5]
Loss weights from content: tensor([ 0.0000, 22.9317, 857.5627, 715.1100, 315.9618, 356.2452, 747.6170,
887.2239, 963.8915, 5.0051, 63.6247, 6.9002, 203.8796, 7.4802,
13.6315, 3.7339, 142.1462, 12.6355, 259.3699, 618.9667])
Using SqueezeNet Backbone
Depth of backbone input = 5
Original OS: 16
New OS: 16
Strides: [2, 2, 2, 2]
Decoder original OS: 16
Decoder new OS: 16
Decoder strides: [2, 2, 2, 2]
Total number of parameters: 915540
Total number of parameters requires_grad: 915540
Param encoder 724032
Param decoder 179968
Param head 11540
No path to pretrained, using random init.
Training in device: cpu
Ignoring class 0 in IoU evaluation
[IOU EVAL] IGNORE: tensor([0])
[IOU EVAL] INCLUDE: tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19])
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Traceback (most recent call last):
File "./train.py", line 115, in
trainer.train()
File "../../tasks/semantic/modules/trainer.py", line 236, in train
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
show_scans=self.ARCH["train"]["show_scans"])
File "../../tasks/semantic/modules/trainer.py", line 307, in train_epoch
for i, (in_vol, proj_mask, projlabels, , path_seq, pathname, , , , , , , , , ) in enumerate(train_loader):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 576, in next
idx, batch = self._get_batch()
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 553, in _get_batch
success, data = self._try_get_batch()
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 511, in _try_get_batch
data = self.data_queue.get(timeout=timeout)
File "/usr/lib/python3.5/multiprocessing/queues.py", line 104, in get
if timeout < 0 or not self._poll(timeout):
File "/usr/lib/python3.5/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 414, in _poll
r = wait([self], timeout)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 911, in wait
ready = selector.select(timeout)
File "/usr/lib/python3.5/selectors.py", line 376, in select
fd_event_list = self._poll.poll(timeout)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/_utils/signal_handling.py", line 63, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 212) is killed by signal: Bus error.
Hi, thank you for the work of open source。 When I run in docker, encountered two problems, train output are as follows.
INTERFACE: dataset /bonnet/KITTI/ arch_cfg config/arch/squeezeseg.yaml data_cfg config/labels/semantic-kitti.yaml log /bonnet/lidar-bonnetal/logs/ pretrained None
Commit hash (training version): b'4233111'
Opening arch config file config/arch/squeezeseg.yaml Opening data config file config/labels/semantic-kitti.yaml No pretrained directory found. Copying files to /bonnet/lidar-bonnetal/logs/ for further reference. Sequences folder exists! Using sequences from /bonnet/KITTI/sequences parsing seq 00 parsing seq 01 parsing seq 02 parsing seq 03 parsing seq 04 parsing seq 05 parsing seq 06 parsing seq 07 parsing seq 09 parsing seq 10 Using 2761 scans from sequences [0, 1, 2, 3, 4, 5, 6, 7, 9, 10] Sequences folder exists! Using sequences from /bonnet/KITTI/sequences parsing seq 05 Using 2761 scans from sequences [5] Loss weights from content: tensor([ 0.0000, 22.9317, 857.5627, 715.1100, 315.9618, 356.2452, 747.6170, 887.2239, 963.8915, 5.0051, 63.6247, 6.9002, 203.8796, 7.4802, 13.6315, 3.7339, 142.1462, 12.6355, 259.3699, 618.9667]) Using SqueezeNet Backbone Depth of backbone input = 5 Original OS: 16 New OS: 16 Strides: [2, 2, 2, 2] Decoder original OS: 16 Decoder new OS: 16 Decoder strides: [2, 2, 2, 2] Total number of parameters: 915540 Total number of parameters requires_grad: 915540 Param encoder 724032 Param decoder 179968 Param head 11540 No path to pretrained, using random init. Training in device: cpu Ignoring class 0 in IoU evaluation [IOU EVAL] IGNORE: tensor([0]) [IOU EVAL] INCLUDE: tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). Traceback (most recent call last): File "./train.py", line 115, in
trainer.train()
File "../../tasks/semantic/modules/trainer.py", line 236, in train
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
show_scans=self.ARCH["train"]["show_scans"])
File "../../tasks/semantic/modules/trainer.py", line 307, in train_epoch
for i, (in_vol, proj_mask, projlabels, , path_seq, pathname, , , , , , , , , ) in enumerate(train_loader):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 576, in next
idx, batch = self._get_batch()
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 553, in _get_batch
success, data = self._try_get_batch()
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 511, in _try_get_batch
data = self.data_queue.get(timeout=timeout)
File "/usr/lib/python3.5/multiprocessing/queues.py", line 104, in get
if timeout < 0 or not self._poll(timeout):
File "/usr/lib/python3.5/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 414, in _poll
r = wait([self], timeout)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 911, in wait
ready = selector.select(timeout)
File "/usr/lib/python3.5/selectors.py", line 376, in select
fd_event_list = self._poll.poll(timeout)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/_utils/signal_handling.py", line 63, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 212) is killed by signal: Bus error.
Looking forward to your reply.