SwinTransformer / Swin-Transformer-Object-Detection

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.
https://arxiv.org/abs/2103.14030
Apache License 2.0
1.78k stars 375 forks source link

What is the error? There is only one class in CLASSES.I don't know if that's the reason,What should I do about it.Thanks! #44

Open kekexixili opened 3 years ago

kekexixili commented 3 years ago

Traceback (most recent call last):  File "tools/train.py", line 187, in     main()  File "tools/train.py", line 183, in main    meta=meta)  File "/home/server/文档/DETR2/Swin/mmdet/apis/train.py", line 185, in train_detector    runner.run(data_loaders, cfg.workflow)  File "/home/server/.local/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run    epoch_runner(data_loaders[i], **kwargs)  File "/home/server/.local/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 45, in train    self.call_hook('before_train_epoch')  File "/home/server/.local/lib/python3.6/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook    getattr(hook, fn_name)(self)  File "/home/server/文档/DETR2/Swin/mmdet/datasets/utils.py", line 150, in before_train_epoch    self._check_head(runner)  File "/home/server/文档/DETR2/Swin/mmdet/datasets/utils.py", line 137, in _check_head    (f'The num_classes ({module.num_classes}) in 'AssertionError: The num_classes (1) in Shared2FCBBoxHead of MMDataParallel does not matches the length of CLASSES 5) in CocoDataset

bfialkoff commented 3 years ago

You should provide enough information so that if someone is inclined to help they can. Its probably something wrong in your config

haritha-j commented 3 years ago

Hard to say without seeing your config, but most probably in your config, you're importing a custom dataset config with five classes, even though you've set it to one in your model config.

ffletcherr commented 3 years ago

you changed all num_classes=80 to num_classes=1 but Likely had a syntax issues in dataset config file. The correct way to modify dataset config file is (comma after class name is important) :

train=dict(
        type=dataset_type,
        # add this line :
        classes = ('yourClass1', 'yourClass2'), # or for One Class :  ('yourClass1',), Notice that comma in necessery 
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        seg_prefix=data_root + 'stuffthingmaps/train2017/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        # and this line :
        classes = ('yourClass1', 'yourClass2'), # or for One Class :  ('yourClass1',), Notice that comma in necessery 
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline),
hoangtubk commented 3 years ago

I have the same error, you can try: CLASSES = (['your_class']) to fix this.

CharlesNJ commented 3 years ago

@haritha-j @ffletcherr how do you use the pre-trained model but predict just two classes, say the truck and car? Any ideas on how to do it? Any help would be greatly appreciated!

nikhil031294 commented 2 years ago

classes = tuple(['hands']) i used this and it worked

Dream-3000 commented 2 years ago

`2021-11-28 17:32:49,714 - mmdet - INFO - Environment info:

sys.platform: linux Python: 3.7.12 (default, Sep 10 2021, 00:21:48) [GCC 7.5.0] CUDA available: True GPU 0: Tesla K80 CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.1.TC455_06.29190527_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.10.0+cu111 PyTorch compiling details: PyTorch built with:

TorchVision: 0.11.1+cu111 OpenCV: 4.1.2 MMCV: 1.3.18 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0+6a979e2

2021-11-28 17:32:54,537 - mmdet - INFO - Distributed training: False 2021-11-28 17:32:59,119 - mmdet - INFO - Config: model = dict( type='CascadeRCNN', pretrained=None, backbone=dict( type='SwinTransformer', embed_dim=128, depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32], window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.3, ape=False, patch_norm=True, out_indices=(0, 1, 2, 3), use_checkpoint=False), neck=dict( type='FPN', in_channels=[128, 256, 512, 1024], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict( type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)), roi_head=dict( type='CascadeRoIHead', num_stages=3, stage_loss_weights=[1, 0.5, 0.25], bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=[ dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='SyncBN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='SyncBN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='SyncBN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)) ], mask_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), mask_head=dict( type='FCNMaskHead', num_convs=4, in_channels=256, conv_out_channels=256, num_classes=1, loss_mask=dict( type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_across_levels=False, nms_pre=2000, nms_post=2000, max_per_img=2000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=[ dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.7, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False) ]), test_cfg=dict( rpn=dict( nms_across_levels=False, nms_pre=1000, nms_post=1000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100, mask_thr_binary=0.5))) dataset_type = 'CocoDataset' data_root = 'data/mycocodata/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[[{ 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }], [{ 'type': 'Resize', 'img_scale': [(400, 1333), (500, 1333), (600, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'override': True, 'keep_ratio': True }]]), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='CocoDataset', classes=('blob', ), ann_file='data/mycocodata/annotations/detections_train2017.json', img_prefix='data/mycocodata/train2017/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[[{ 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }], [{ 'type': 'Resize', 'img_scale': [(400, 1333), (500, 1333), (600, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'override': True, 'keep_ratio': True }]]), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict( type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ]), val=dict( type='CocoDataset', classes=('blob', ), ann_file='data/mycocodata/annotations/detections_val2017.json', img_prefix='data/mycocodata/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', classes=('blob', ), ann_file='data/mycocodata/annotations/detections_val2017.json', img_prefix='data/mycocodata/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(metric=['bbox', 'segm']) optimizer = dict( type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05, paramwise_cfg=dict( custom_keys=dict( absolute_pos_embed=dict(decay_mult=0.0), relative_position_bias_table=dict(decay_mult=0.0), norm=dict(decay_mult=0.0)))) optimizer_config = dict( grad_clip=None, type='DistOptimizerHook', update_interval=1, coalesce=True, bucket_size_mb=-1, use_fp16=True) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[27, 33]) runner = dict(type='EpochBasedRunnerAmp', max_epochs=36) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] fp16 = None work_dir = './work_dirs/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco' gpu_ids = range(0, 1)

loading annotations into memory... Done (t=0.01s) creating index... index created! 2021-11-28 17:33:02,793 - mmdet - INFO - Start running, host: root@4f954eb6a772, work_dir: /content/drive/MyDrive/Swin-Transformer-Object-Detection/work_dirs/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco 2021-11-28 17:33:02,795 - mmdet - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) StepLrUpdaterHook
(ABOVE_NORMAL) DistOptimizerHook
(NORMAL ) CheckpointHook
(NORMAL ) EvalHook
(VERY_LOW ) TextLoggerHook


before_train_epoch: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) EvalHook
(NORMAL ) NumClassCheckHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook


before_train_iter: (VERY_HIGH ) StepLrUpdaterHook
(LOW ) IterTimerHook


after_train_iter: (ABOVE_NORMAL) DistOptimizerHook
(NORMAL ) CheckpointHook
(NORMAL ) EvalHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook


after_train_epoch: (NORMAL ) CheckpointHook
(NORMAL ) EvalHook
(VERY_LOW ) TextLoggerHook


before_val_epoch: (NORMAL ) NumClassCheckHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook


before_val_iter: (LOW ) IterTimerHook


after_val_iter: (LOW ) IterTimerHook


after_val_epoch: (VERY_LOW ) TextLoggerHook


after_run: (VERY_LOW ) TextLoggerHook


2021-11-28 17:33:02,800 - mmdet - INFO - workflow: [('train', 1)], max: 36 epochs 2021-11-28 17:33:02,806 - mmdet - INFO - Checkpoints will be saved to /content/drive/MyDrive/Swin-Transformer-Object-Detection/work_dirs/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco by HardDiskBackend. Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic loading annotations into memory... Done (t=0.00s) creating index... index created!

IndexError Traceback (most recent call last) /content/drive/MyDrive/Swin-Transformer-Object-Detection/tools/train.py in () 185 186 if name == 'main': --> 187 main()

20 frames /usr/local/lib/python3.7/dist-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/utils.py in cached_cast(cast_fn, x, cache) 95 if x.requires_grad and cached_x.requires_grad: 96 # Make sure x is actually cached_x's autograd parent. ---> 97 if cached_x.grad_fn.next_functions[1][0].variable is not x: 98 raise RuntimeError("x and cache[x] both require grad, but x is not " 99 "cache[x]'s parent. This is likely an error.")

IndexError: tuple index out of range`