PaddlePaddle / PaddleX

Low-code development tool based on PaddlePaddle(飞桨低代码开发工具)
Apache License 2.0
4.76k stars 936 forks source link

SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception. #1578

Open livingbody opened 2 years ago

livingbody commented 2 years ago
2022-08-21 21:38:18 [INFO]  There are 135/241 variables loaded into YOLOv3.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:654: UserWarning: When training, we now always track global mean and variance.
  "When training, we now always track global mean and variance.")
2022-08-21 21:40:03 [INFO]  [TRAIN] Epoch 1 finished, loss_xy=7.1577754, loss_wh=8.758207, loss_obj=2316.232, loss_cls=13.233265, loss=2345.3813 .
2022-08-21 21:40:31 [INFO]  [TRAIN] Epoch=2/200, Step=2/8, loss_xy=7.030112, loss_wh=8.656177, loss_obj=2033.386108, loss_cls=12.993334, loss=2062.065918, lr=0.000001, time_each_step=13.16s, eta=5:49:30
2022-08-21 21:41:46 [INFO]  [TRAIN] Epoch 2 finished, loss_xy=7.3628964, loss_wh=8.907021, loss_obj=1748.391, loss_cls=13.350792, loss=1778.0117 .
2022-08-21 21:42:35 [WARNING]   fail to map batch transform [<paddlex.cv.transforms.batch_operators._Gt2YoloTarget object at 0x7f044dc054d0>] with error: index 28 is out of bounds for axis 3 with size 28 and stack:
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/transforms/batch_operators.py", line 38, in __call__
    samples = op(samples)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/transforms/batch_operators.py", line 242, in __call__
    target[best_n, 0, gj, gi] = gx * grid_w - gi
IndexError: index 28 is out of bounds for axis 3 with size 28

Exception in thread Thread-8:
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 534, in _thread_loop
    batch = self._get_data()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 671, in _get_data
    batch.reraise()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/worker.py", line 169, in reraise
    raise self.exc_type(msg)
IndexError: DataLoader worker(0) caught IndexError with message:
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/worker.py", line 336, in _worker_loop
    batch = fetcher.fetch(indices)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/fetcher.py", line 134, in fetch
    data = self.collate_fn(data)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/transforms/batch_operators.py", line 44, in __call__
    raise e
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/transforms/batch_operators.py", line 38, in __call__
    samples = op(samples)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/transforms/batch_operators.py", line 242, in __call__
    target[best_n, 0, gj, gi] = gx * grid_w - gi
IndexError: index 28 is out of bounds for axis 3 with size 28

---------------------------------------------------------------------------SystemError                               Traceback (most recent call last)/tmp/ipykernel_12480/1127306615.py in <module>
     12     lr_decay_epochs=[80, 120],
     13     save_dir='output/mobilenetv1',
---> 14     use_vdl=True)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/models/detector.py in train(self, num_epochs, train_dataset, train_batch_size, eval_dataset, optimizer, save_interval_epochs, log_interval_steps, save_dir, pretrain_weights, learning_rate, warmup_steps, warmup_start_lr, lr_decay_epochs, lr_decay_gamma, metric, use_ema, early_stop, early_stop_patience, use_vdl, resume_checkpoint)
    284             early_stop=early_stop,
    285             early_stop_patience=early_stop_patience,
--> 286             use_vdl=use_vdl)
    287 
    288     def quant_aware_train(self,
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/models/base.py in train_loop(self, num_epochs, train_dataset, train_batch_size, eval_dataset, save_interval_epochs, log_interval_steps, save_dir, ema, early_stop, early_stop_patience, use_vdl)
    329             step_time_tic = time.time()
    330 
--> 331             for step, data in enumerate(self.train_data_loader()):
    332                 if nranks > 1:
    333                     outputs = self.run(ddp_net, data, mode='train')
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py in __next__(self)
    744             else:
    745                 if _in_legacy_dygraph():
--> 746                     data = self._reader.read_next_var_list()
    747                     data = _restore_batch(data, self._structure_infos.pop(0))
    748                 else:
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:166)
lailuboy commented 2 years ago

看起来数据或标注有问题,发下你的训练代码看看