yolox does not support COCO128

zhanglirong1999 commented 2 years ago

Using yolox training COCO128 dataset will result in some errors, such as getting incorrect output from evaluate function. So we need to train on the full COCO dataset. The full COCO dataset has three folders: annotations, train2017, val2017. There are 118287 images in train2017 and 5000 images in val2017. We need use full COCO dataset instead of mini dataset

hkvision commented 2 years ago

Paste the error of COCO128 here and describe this dataset as well :)

zhanglirong1999 commented 2 years ago

If you use the official training model to evaluate coco128, the error as follows: 2022-03-30 15:56:05 | ERROR | yolox.core.launch:100 - An error has been caught in function 'launch', process 'MainProcess' (26317), thread 'MainThread' (140210491134144): Traceback (most recent call last):

File "/home/lirong/anaconda3/envs/yolo/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) └ ModuleSpec(name='yolox.tools.train', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f850d1d0a50>, origin='/... File "/home/lirong/anaconda3/envs/yolo/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) │ └ {'name': 'main', 'doc': None, 'package': 'yolox.tools', 'loader': <_frozen_importlib_external.SourceFileL... └ <code object at 0x7f854b387c00, file "/home/lirong/YOLO_RUN/YOLOX/tools/train.py", line 5>

File "/home/lirong/YOLO_RUN/YOLOX/tools/train.py", line 138, in args=(exp, args), │ └ Namespace(batch_size=64, cache=False, ckpt=None, cluster_mode='local', devices=None, dist_backend='spark', dist_url=None, dri... └ ╒═══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════...

File "/home/lirong/YOLO_RUN/YOLOX/yolox/core/launch.py", line 100, in launch main_func(*args) │ └ (╒═══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════... └ <function main at 0x7f850d1b9f80>

File "/home/lirong/YOLO_RUN/YOLOX/tools/train.py", line 106, in main trainer.train_in_orca() │ └ <function Trainer.train_in_orca at 0x7f850c4cb680> └ <yolox.core.trainer.Trainer object at 0x7f852f735690>

File "/home/lirong/YOLO_RUN/YOLOX/yolox/core/trainer.py", line 233, in train_in_orca self.evaluate_and_save_model_orca(model) │ │ └ YOLOX( │ │ (backbone): YOLOPAFPN( │ │ (backbone): CSPDarknet( │ │ (stem): Focus( │ │ (conv): BaseConv( │ │ (conv): ... │ └ <function Trainer.evaluate_and_save_model_orca at 0x7f850c4cbdd0> └ <yolox.core.trainer.Trainer object at 0x7f852f735690>

File "/home/lirong/YOLO_RUN/YOLOX/yolox/core/trainer.py", line 484, in evaluate_and_save_model_orca self.is_distributed │ └ False └ <yolox.core.trainer.Trainer object at 0x7f852f735690>

File "/home/lirong/YOLO_RUN/YOLOX/yolox/exp/yolox_base.py", line 323, in eval return evaluator.evaluate(model, is_distributed, half) │ │ │ │ └ False │ │ │ └ False │ │ └ YOLOX( │ │ (backbone): YOLOPAFPN( │ │ (backbone): CSPDarknet( │ │ (stem): Focus( │ │ (conv): BaseConv( │ │ (conv): ... │ └ <function COCOEvaluator.evaluate at 0x7f852e1dfc20> └ <yolox.evaluators.coco_evaluator.COCOEvaluator object at 0x7f8505017cd0>

File "/home/lirong/YOLO_RUN/YOLOX/yolox/evaluators/coco_evaluator.py", line 189, in evaluate data_list.extend(self.convert_to_coco_format(outputs, info_imgs, ids)) │ │ │ │ │ │ └ tensor([[1], │ │ │ │ │ │ [2], │ │ │ │ │ │ [3], │ │ │ │ │ │ [4], │ │ │ │ │ │ [5]]) │ │ │ │ │ └ [tensor([480, 426, 428, 425, 640]), tensor([640, 640, 640, 640, 481])] │ │ │ │ └ [tensor([[ 3.0659e+02, -1.2641e+00, 3.2204e+02, 2.4427e+02, 9.7571e-01, │ │ │ │ 9.5652e-01, 4.5000e+01], │ │ │ │ [ 2.07... │ │ │ └ <function COCOEvaluator.convert_to_coco_format at 0x7f852e1dfb90> │ │ └ <yolox.evaluators.coco_evaluator.COCOEvaluator object at 0x7f8505017cd0> │ └ <method 'extend' of 'list' objects> └ []

File "/home/lirong/YOLO_RUN/YOLOX/yolox/evaluators/coco_evaluator.py", line 224, in convert_to_coco_format label = self.dataloader.dataset.class_ids[int(cls[ind])] │ │ │ │ │ └ 0 │ │ │ │ └ tensor([75., 58., 60., 47., 32., 13., 49., 32.]) │ │ │ └ [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 3... │ │ └ <yolox.data.datasets.coco.COCODataset object at 0x7f8505017d90> │ └ <torch.utils.data.dataloader.DataLoader object at 0x7f8505017e50> └ <yolox.evaluators.coco_evaluator.COCOEvaluator object at 0x7f8505017cd0>

If you use your own training model to train coco128 and evaluate it, you will always get the the AP 0.00. Because the data_list get from coco128 is None.

zhanglirong1999 commented 2 years ago

The difference between the full COCO and COCO128 is COCO128 has only 128 images in train2017 folder and 5 images in val2017 folder. When yolox loads the data from folder and gets json from annotations, there are some problems in output and we can not fix it. So use the full COCO instead of COCO mini(COCO128).

zhanglirong1999 commented 2 years ago

Some similar issues‘ link: yolox-s with default config cannnot converge in mini-coco128: https://github.com/Megvii-BaseDetection/YOLOX/issues/995

the best AP is 0.00: https://github.com/Megvii-BaseDetection/YOLOX/issues/344 https://github.com/Megvii-BaseDetection/YOLOX/issues/1199

analytics-zoo / YOLOX

yolox does not support COCO128 #4