I launch a training on 512x512 8 bits .png images, I got this issue after more than 12 epochs training, I really don't understand why, everything was ok before this:
[06/05 04:18:15] d2.engine.train_loop ERROR: Exception during training:
Traceback (most recent call last):
File "/home/appuser/detectron2_repo/detectron2/engine/train_loop.py", line 132, in train
self.run_step()
File "/home/appuser/detectron2_repo/detectron2/engine/train_loop.py", line 209, in run_step
data = next(self._data_loader_iter)
File "/home/appuser/detectron2_repo/detectron2/data/common.py", line 140, in iter
for d in self.dataset:
File "/home/appuser/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/home/appuser/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/appuser/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/appuser/.local/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
OSError: Caught OSError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/appuser/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/appuser/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/appuser/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/appuser/detectron2_repo/detectron2/data/common.py", line 41, in getitem
data = self._map_func(self._dataset[cur_idx])
File "/home/appuser/detectron2_repo/detectron2/utils/serialize.py", line 23, in call
return self._obj(*args, **kwargs)
File "/home/appuser/detectron2_repo/detectron2/data/dataset_mapper.py", line 77, in call
image = utils.read_image(dataset_dict["file_name"], format=self.img_format)
File "/home/appuser/detectron2_repo/detectron2/data/detection_utils.py", line 49, in read_image
image = Image.open(f)
File "/home/appuser/.local/lib/python3.6/site-packages/PIL/Image.py", line 2818, in open
prefix = fp.read(16)
OSError: [Errno 121] Remote I/O error
Hi everybody,
I launch a training on 512x512 8 bits .png images, I got this issue after more than 12 epochs training, I really don't understand why, everything was ok before this:
[06/05 04:18:15] d2.engine.train_loop ERROR: Exception during training: Traceback (most recent call last): File "/home/appuser/detectron2_repo/detectron2/engine/train_loop.py", line 132, in train self.run_step() File "/home/appuser/detectron2_repo/detectron2/engine/train_loop.py", line 209, in run_step data = next(self._data_loader_iter) File "/home/appuser/detectron2_repo/detectron2/data/common.py", line 140, in iter for d in self.dataset: File "/home/appuser/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/appuser/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/home/appuser/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/home/appuser/.local/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise raise self.exc_type(msg) OSError: Caught OSError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/appuser/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/appuser/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/appuser/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/appuser/detectron2_repo/detectron2/data/common.py", line 41, in getitem
data = self._map_func(self._dataset[cur_idx])
File "/home/appuser/detectron2_repo/detectron2/utils/serialize.py", line 23, in call
return self._obj(*args, **kwargs)
File "/home/appuser/detectron2_repo/detectron2/data/dataset_mapper.py", line 77, in call
image = utils.read_image(dataset_dict["file_name"], format=self.img_format)
File "/home/appuser/detectron2_repo/detectron2/data/detection_utils.py", line 49, in read_image
image = Image.open(f)
File "/home/appuser/.local/lib/python3.6/site-packages/PIL/Image.py", line 2818, in open
prefix = fp.read(16)
OSError: [Errno 121] Remote I/O error
here is the whole log.txt file: log.txt
her is my config.yaml: CUDNN_BENCHMARK: false DATALOADER: ASPECT_RATIO_GROUPING: true FILTER_EMPTY_ANNOTATIONS: true NUM_WORKERS: 4 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: [] PROPOSAL_FILES_TRAIN: [] TEST: