Megvii-BaseDetection / YOLOX

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
Apache License 2.0
9.43k stars 2.21k forks source link

训练COCO格式数据集出现的报错 #1179

Open lixiangMindSpore opened 2 years ago

lixiangMindSpore commented 2 years ago

2022-03-15 10:16:36.888 | ERROR | yolox.core.launch:_distributed_worker:147 - An error has been caught in function '_distributed_worker', process 'SpawnProcess-1' (6027), thread 'MainThread' (140438034846656): Traceback (most recent call last):

File "", line 1, in File "/home/zhst/.conda/envs/yolox/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main exitcode = _main(fd) │ └ 8 └ <function _main at 0x7fba45e4c400> File "/home/zhst/.conda/envs/yolox/lib/python3.6/multiprocessing/spawn.py", line 118, in _main return self._bootstrap() │ └ <function BaseProcess._bootstrap at 0x7fba45f218c8> └ <SpawnProcess(SpawnProcess-1, started)> File "/home/zhst/.conda/envs/yolox/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() │ └ <function BaseProcess.run at 0x7fba45f210d0> └ <SpawnProcess(SpawnProcess-1, started)> File "/home/zhst/.conda/envs/yolox/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, *self._kwargs) │ │ │ │ │ └ {} │ │ │ │ └ <SpawnProcess(SpawnProcess-1, started)> │ │ │ └ (<function _distributed_worker at 0x7fb8c9de4ae8>, 0, (<function main at 0x7fb9e1587400>, 4, 4, 0, 'nccl', 'tcp://127.0.0.1:4... │ │ └ <SpawnProcess(SpawnProcess-1, started)> │ └ <function _wrap at 0x7fb8d9462ae8> └ <SpawnProcess(SpawnProcess-1, started)> File "/home/zhst/.conda/envs/yolox/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, args) │ │ └ (<function main at 0x7fb9e1587400>, 4, 4, 0, 'nccl', 'tcp://127.0.0.1:42353', (╒═══════════════════╤═════════════════════════... │ └ 0 └ <function _distributed_worker at 0x7fb8c9de4ae8>

File "/data_hdd/lixiang/person_detection/YOLOX/yolox/core/launch.py", line 147, in _distributed_worker main_func(*args) │ └ (╒═══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════... └ <function main at 0x7fb9e1587400>

File "/data_hdd/lixiang/person_detection/YOLOX/tools/train.py", line 118, in main trainer.train() │ └ <function Trainer.train at 0x7fb8d9924510> └ <yolox.core.trainer.Trainer object at 0x7fb9e1528fd0>

File "/data_hdd/lixiang/person_detection/YOLOX/yolox/core/trainer.py", line 73, in train self.before_train() │ └ <function Trainer.before_train at 0x7fb9e15842f0> └ <yolox.core.trainer.Trainer object at 0x7fb9e1528fd0>

File "/data_hdd/lixiang/person_detection/YOLOX/yolox/core/trainer.py", line 155, in before_train self.prefetcher = DataPrefetcher(self.train_loader) │ │ │ └ <yolox.data.dataloading.DataLoader object at 0x7fb8d8f43908> │ │ └ <yolox.core.trainer.Trainer object at 0x7fb9e1528fd0> │ └ <class 'yolox.data.data_prefetcher.DataPrefetcher'> └ <yolox.core.trainer.Trainer object at 0x7fb9e1528fd0>

File "/data_hdd/lixiang/person_detection/YOLOX/yolox/data/data_prefetcher.py", line 21, in init self.preload() │ └ <function DataPrefetcher.preload at 0x7fb8d99058c8> └ <yolox.data.data_prefetcher.DataPrefetcher object at 0x7fb8d8e960b8>

File "/data_hdd/lixiang/person_detection/YOLOX/yolox/data/data_prefetcher.py", line 25, in preload self.next_input, self.nexttarget, , _ = next(self.loader) │ │ │ └ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fb8d8e29ef0> │ │ └ <yolox.data.data_prefetcher.DataPrefetcher object at 0x7fb8d8e960b8> │ └ <yolox.data.data_prefetcher.DataPrefetcher object at 0x7fb8d8e960b8> └ <yolox.data.data_prefetcher.DataPrefetcher object at 0x7fb8d8e960b8>

File "/home/zhst/.conda/envs/yolox/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() │ └ <function _MultiProcessingDataLoaderIter._next_data at 0x7fb8d8f102f0> └ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fb8d8e29ef0> File "/home/zhst/.conda/envs/yolox/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data return self._process_data(data) │ │ └ <torch._utils.ExceptionWrapper object at 0x7fb8445e9080> │ └ <function _MultiProcessingDataLoaderIter._process_data at 0x7fb8d8f10400> └ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fb8d8e29ef0> File "/home/zhst/.conda/envs/yolox/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data data.reraise() │ └ <function ExceptionWrapper.reraise at 0x7fba45d41b70> └ <torch._utils.ExceptionWrapper object at 0x7fb8445e9080> File "/home/zhst/.conda/envs/yolox/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise raise exception └ AssertionError('Caught AssertionError in DataLoader worker process 0.\nOriginal Traceback (most recent call last):\n File "/...

AssertionError: Caught AssertionError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/zhst/.conda/envs/yolox/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/home/zhst/.conda/envs/yolox/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/zhst/.conda/envs/yolox/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/data_hdd/lixiang/person_detection/YOLOX/yolox/data/datasets/datasets_wrapper.py", line 110, in wrapper ret_val = getitem_fn(self, index) File "/data_hdd/lixiang/person_detection/YOLOX/yolox/data/datasets/mosaicdetection.py", line 93, in getitem img, labels, , img_id = self._dataset.pull_item(index) File "/data_hdd/lixiang/person_detection/YOLOX/yolox/data/datasets/coco.py", line 206, in pull_item img = self.load_resized_img(index) File "/data_hdd/lixiang/person_detection/YOLOX/yolox/data/datasets/coco.py", line 179, in load_resized_img img = self.load_image(index) File "/data_hdd/lixiang/person_detection/YOLOX/yolox/data/datasets/coco.py", line 194, in load_image assert img is not None AssertionError

Superyanzhuang commented 2 years ago

我也遇到这样的问题,这个问题解决了吗?

Edward-lyz commented 2 years ago

我也遇到这样的问题,这个问题解决了吗?

必须保持跟coco2017的格式一样即可

tan90du-sx commented 2 years ago

麻烦请问一下,coco2017数据集是不是存在一些问题,需要处理之后才能进行训练。因为我直接训练遇到了标签溢出,得到的解决方法多是转换成voc处理。期待大家的回复,谢谢。

buzhiqimeiliuqiangdong commented 1 year ago

麻烦请问一下,coco2017数据集是不是存在一些问题,需要处理之后才能进行训练。因为我直接训练遇到了标签溢出,得到的解决方法多是转换成voc处理。期待大家的回复,谢谢。

请问您解决了吗?我也遇到了索引超出范围