This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.
I have searched related issues but cannot get the expected help.
yes
I have read the FAQ documentation but cannot get the expected help.
yes
The bug has not been fixed in the latest version.
maybe
Describe the bug
I was using a custom Cocodataset to run Swin-transformer-Object-Detection, where during training, an error occurred right after 1 epoch.
`
2022-02-13 23:46:06,781 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
2022-02-13 23:46:06,781 - mmdet - INFO - Checkpoints will be saved to C:\tf\Swin-Transformer-Object-Detection-master\work_dirs\mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco by HardDiskBackend.
C:\tftools\Conda\envs\cc\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.)
`
and the classes in coco.py and class_names has been correctly modified, num_classes in mask_rcnn_swin_fpn.py is changed to 7. Earlier today I've tried another dataset with 5 labels, which worked properly. The only difference between two datasets is that the one worked has all .jpg and .json files in order, while the one failed begins with 1.json and next come to 10.json, which I think is fine because it matches the .img files.
Reproduction
What command or script did you run?
python tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py
A placeholder for the command.
Did you make any modifications on the code or config?
yes
Did you understand what you have modified?
yes
What dataset did you use?
Custom Coco Dataset
Environment
Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
`fatal: not a git repository (or any of the parent directories): .git
sys.platform: win32
Python: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:19:05) [MSC v.1916 64 bit (AMD64)]
CUDA available: True
GPU 0: GeForce RTX 3090
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1
NVCC: Not Available
GCC: n/a
PyTorch: 1.9.0+cu111
PyTorch compiling details: PyTorch built with:
C++ Version: 199711
MSVC 192829337
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
Greatly appreciated if anyone could help !!!
Checklist
Describe the bug I was using a custom Cocodataset to run Swin-transformer-Object-Detection, where during training, an error occurred right after 1 epoch. ` 2022-02-13 23:46:06,781 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
2022-02-13 23:46:06,781 - mmdet - INFO - Checkpoints will be saved to C:\tf\Swin-Transformer-Object-Detection-master\work_dirs\mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco by HardDiskBackend.
C:\tftools\Conda\envs\cc\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 20/20, 1.7 task/s, elapsed: 12s, ETA: 0sTraceback (most recent call last):
File "tools/train.py", line 187, in
File "tools/train.py", line 176, in main
File "c:\tf\swin-transformer-object-detection-master\mmdet\apis\train.py", line 185, in train_detector
File "c:\tf\mmcv\mmcv\runner\epoch_based_runner.py", line 127, in run
File "c:\tf\mmcv\mmcv\runner\epoch_based_runner.py", line 54, in train
File "c:\tf\mmcv\mmcv\runner\base_runner.py", line 307, in call_hook
File "c:\tf\swin-transformer-object-detection-master\mmdet\core\evaluation\eval_hooks.py", line 147, in after_train_epoch
File "c:\tf\swin-transformer-object-detection-master\mmdet\core\evaluation\eval_hooks.py", line 176, in evaluate
File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 419, in evaluate
File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 364, in format_results
File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 301, in results2json
File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 252, in _segm2json
IndexError: list index out of range
`
I had cloesly followed the Cocodataset format, the instances_train2017.json is like :
` "images": [
` (part of the file) and the dataset category is :
` "categories": [ { "supercategory": "buckle", "id": 0, "name": "buckle" }, { "supercategory": "cable", "id": 1, "name": "cable" }, { "supercategory": "crack", "id": 2, "name": "crack" }, { "supercategory": "pit", "id": 3, "name": "pit" }, { "supercategory": "rail", "id": 4, "name": "rail" }, { "supercategory": "screw", "id": 5, "name": "screw" }, { "supercategory": "support", "id": 6, "name": "support" } ],
` and the classes in coco.py and class_names has been correctly modified, num_classes in mask_rcnn_swin_fpn.py is changed to 7. Earlier today I've tried another dataset with 5 labels, which worked properly. The only difference between two datasets is that the one worked has all .jpg and .json files in order, while the one failed begins with 1.json and next come to 10.json, which I think is fine because it matches the .img files.
Reproduction
What command or script did you run?
python tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py
A placeholder for the command.Did you make any modifications on the code or config? yes Did you understand what you have modified? yes
What dataset did you use? Custom Coco Dataset
Environment
python mmdet/utils/collect_env.py
to collect necessary environment information and paste it here. `fatal: not a git repository (or any of the parent directories): .git sys.platform: win32 Python: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:19:05) [MSC v.1916 64 bit (AMD64)] CUDA available: True GPU 0: GeForce RTX 3090 CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1 NVCC: Not Available GCC: n/a PyTorch: 1.9.0+cu111 PyTorch compiling details: PyTorch built with:TorchVision: 0.10.0+cu111 OpenCV: 4.5.5 MMCV: 1.3.17 MMCV Compiler: MSVC 192930140 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0+`
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
$PATH
,$LD_LIBRARY_PATH
,$PYTHONPATH
, etc.) noError traceback `Traceback (most recent call last):
File "tools/train.py", line 187, in
File "tools/train.py", line 176, in main
File "c:\tf\swin-transformer-object-detection-master\mmdet\apis\train.py", line 185, in train_detector
File "c:\tf\mmcv\mmcv\runner\epoch_based_runner.py", line 127, in run
File "c:\tf\mmcv\mmcv\runner\epoch_based_runner.py", line 54, in train
File "c:\tf\mmcv\mmcv\runner\base_runner.py", line 307, in call_hook
File "c:\tf\swin-transformer-object-detection-master\mmdet\core\evaluation\eval_hooks.py", line 147, in after_train_epoch
File "c:\tf\swin-transformer-object-detection-master\mmdet\core\evaluation\eval_hooks.py", line 176, in evaluate
File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 419, in evaluate
File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 364, in format_results
File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 301, in results2json
File "c:\tf\swin-transformer-object-detection-master\mmdet\datasets\coco.py", line 252, in _segm2json
IndexError: list index out of range`
It would be great if anyone could help me solve this.