SwinTransformer / Swin-Transformer-Object-Detection

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.
https://arxiv.org/abs/2103.14030
Apache License 2.0
1.81k stars 381 forks source link

data['category_id'] = self.cat_ids[label] IndexError: list index out of range #144

Closed 1zhou-Wang closed 2 years ago

1zhou-Wang commented 2 years ago

Greatly appreciated if anyone could help !!!

Checklist

  1. I have searched related issues but cannot get the expected help. yes
  2. I have read the FAQ documentation but cannot get the expected help. yes
  3. The bug has not been fixed in the latest version. maybe

Describe the bug I was using a custom Cocodataset to run Swin-transformer-Object-Detection, where during training, an error occurred right after 1 epoch. ` 2022-02-13 23:46:06,781 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs

2022-02-13 23:46:06,781 - mmdet - INFO - Checkpoints will be saved to C:\tf\Swin-Transformer-Object-Detection-master\work_dirs\mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco by HardDiskBackend.

C:\tftools\Conda\envs\cc\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.)

return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)

Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0

Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0

Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 20/20, 1.7 task/s, elapsed: 12s, ETA: 0sTraceback (most recent call last):

File "tools/train.py", line 187, in

main()

File "tools/train.py", line 176, in main

train_detector(

File "c:\tf\swin-transformer-object-detection-master\mmdet\apis\train.py", line 185, in train_detector

runner.run(data_loaders, cfg.workflow)

File "c:\tf\mmcv\mmcv\runner\epoch_based_runner.py", line 127, in run

epoch_runner(data_loaders[i], **kwargs)

File "c:\tf\mmcv\mmcv\runner\epoch_based_runner.py", line 54, in train

self.call_hook('after_train_epoch')

File "c:\tf\mmcv\mmcv\runner\base_runner.py", line 307, in call_hook

getattr(hook, fn_name)(self)

File "c:\tf\swin-transformer-object-detection-master\mmdet\core\evaluation\eval_hooks.py", line 147, in after_train_epoch

key_score = self.evaluate(runner, results)

File "c:\tf\swin-transformer-object-detection-master\mmdet\core\evaluation\eval_hooks.py", line 176, in evaluate

eval_res = self.dataloader.dataset.evaluate(

File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 419, in evaluate

result_files, tmp_dir = self.format_results(results, jsonfile_prefix)

File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 364, in format_results

result_files = self.results2json(results, jsonfile_prefix)

File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 301, in results2json

json_results = self._segm2json(results)

File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 252, in _segm2json

data['category_id'] = self.cat_ids[label]

IndexError: list index out of range

`

I had cloesly followed the Cocodataset format, the instances_train2017.json is like :

` "images": [

    {

        "height": 1024,

        "width": 2048,

        "id": 0,

        "file_name": "1.jpg"

    },

    {

        "height": 1024,

        "width": 2048,

        "id": 1,

        "file_name": "10.jpg"

    },

    {

        "height": 1024,

        "width": 2048,

        "id": 2,

        "file_name": "100.jpg"

    },

    {

        "height": 1024,

        "width": 2048,

        "id": 3,

        "file_name": "101.jpg"

    },

    {

        "height": 1024,

        "width": 2048,

        "id": 4,

        "file_name": "102.jpg"

    },

    {

        "height": 1024,

        "width": 2048,

        "id": 5,

        "file_name": "103.jpg"

    },

    {

        "height": 1024,

        "width": 2048,

        "id": 6,

        "file_name": "104.jpg"

    },

    {

        "height": 1024,

        "width": 2048,

        "id": 7,

        "file_name": "105.jpg"

    },
    {

        "height": 1024,

        "width": 2048,

        "id": 8,

        "file_name": "106.jpg"

    },

` (part of the file) and the dataset category is :

` "categories": [ { "supercategory": "buckle", "id": 0, "name": "buckle" }, { "supercategory": "cable", "id": 1, "name": "cable" }, { "supercategory": "crack", "id": 2, "name": "crack" }, { "supercategory": "pit", "id": 3, "name": "pit" }, { "supercategory": "rail", "id": 4, "name": "rail" }, { "supercategory": "screw", "id": 5, "name": "screw" }, { "supercategory": "support", "id": 6, "name": "support" } ],

` and the classes in coco.py and class_names has been correctly modified, num_classes in mask_rcnn_swin_fpn.py is changed to 7. Earlier today I've tried another dataset with 5 labels, which worked properly. The only difference between two datasets is that the one worked has all .jpg and .json files in order, while the one failed begins with 1.json and next come to 10.json, which I think is fine because it matches the .img files.

Reproduction

  1. What command or script did you run? python tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py A placeholder for the command.

  2. Did you make any modifications on the code or config? yes Did you understand what you have modified? yes

  3. What dataset did you use? Custom Coco Dataset

Environment

  1. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here. `fatal: not a git repository (or any of the parent directories): .git sys.platform: win32 Python: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:19:05) [MSC v.1916 64 bit (AMD64)] CUDA available: True GPU 0: GeForce RTX 3090 CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1 NVCC: Not Available GCC: n/a PyTorch: 1.9.0+cu111 PyTorch compiling details: PyTorch built with:
    • C++ Version: 199711
    • MSVC 192829337
    • Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
    • Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
    • OpenMP 2019
    • CPU capability usage: AVX2
    • CUDA Runtime 11.1
    • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
    • CuDNN 8.0.5
    • Magma 2.5.4
    • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=C:/w/b/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/w/b/windows/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON,

TorchVision: 0.10.0+cu111 OpenCV: 4.5.5 MMCV: 1.3.17 MMCV Compiler: MSVC 192930140 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0+`

  1. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.) no

Error traceback `Traceback (most recent call last):

File "tools/train.py", line 187, in

main()

File "tools/train.py", line 176, in main

train_detector(

File "c:\tf\swin-transformer-object-detection-master\mmdet\apis\train.py", line 185, in train_detector

runner.run(data_loaders, cfg.workflow)

File "c:\tf\mmcv\mmcv\runner\epoch_based_runner.py", line 127, in run

epoch_runner(data_loaders[i], **kwargs)

File "c:\tf\mmcv\mmcv\runner\epoch_based_runner.py", line 54, in train

self.call_hook('after_train_epoch')

File "c:\tf\mmcv\mmcv\runner\base_runner.py", line 307, in call_hook

getattr(hook, fn_name)(self)

File "c:\tf\swin-transformer-object-detection-master\mmdet\core\evaluation\eval_hooks.py", line 147, in after_train_epoch

key_score = self.evaluate(runner, results)

File "c:\tf\swin-transformer-object-detection-master\mmdet\core\evaluation\eval_hooks.py", line 176, in evaluate

eval_res = self.dataloader.dataset.evaluate(

File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 419, in evaluate

result_files, tmp_dir = self.format_results(results, jsonfile_prefix)

File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 364, in format_results

result_files = self.results2json(results, jsonfile_prefix)

File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 301, in results2json

json_results = self._segm2json(results)

File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 252, in _segm2json

data['category_id'] = self.cat_ids[label]

IndexError: list index out of range`

A placeholder for trackback.

It would be great if anyone could help me solve this.

1zhou-Wang commented 2 years ago

Ok, I found the problem.... The names of images in datasets should not begin with numbers.