SwinTransformer / Swin-Transformer-Object-Detection

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.
https://arxiv.org/abs/2103.14030
Apache License 2.0
1.79k stars 377 forks source link

Run dist_train.sh with an error #167

Closed billfjj closed 2 years ago

billfjj commented 2 years ago

(swin_det) amax@admin:~/LJW/swin_transformer/Swin-Transformer-Object-Detection$ ./tools/dist_train.sh configs/swin/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py 2 --cfg-options model.pretrained=checkpoint/cascade_mask_rcnn_swin_base_patch4_window7.pth ***** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


fused_weight_gradient_mlp_cuda module not found. gradient accumulation fusion with weight gradient computation disabled. fused_weight_gradient_mlp_cuda module not found. gradient accumulation fusion with weight gradient computation disabled. 2022-05-05 10:26:22,567 - mmdet - INFO - Environment info:

sys.platform: linux Python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:18) [GCC 10.3.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3080 CUDA_HOME: /usr NVCC: Cuda compilation tools, release 10.1, V10.1.243 GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 PyTorch: 1.8.2 PyTorch compiling details: PyTorch built with:

TorchVision: 0.9.2 OpenCV: 4.5.5 MMCV: 1.3.17 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0+461e003

2022-05-05 10:26:24,690 - mmdet - INFO - Distributed training: True 2022-05-05 10:26:26,847 - mmdet - INFO - Config: model = dict( type='CascadeRCNN', pretrained='checkpoint/cascade_mask_rcnn_swin_base_patch4_window7.pth', backbone=dict( type='SwinTransformer', embed_dim=128, depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32], window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.3, ape=False, patch_norm=True, out_indices=(0, 1, 2, 3), use_checkpoint=False), neck=dict( type='FPN', in_channels=[128, 256, 512, 1024], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict( type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)), roi_head=dict( type='CascadeRoIHead', num_stages=3, stage_loss_weights=[1, 0.5, 0.25], bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=[ dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=50, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='SyncBN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=50, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='SyncBN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=50, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='SyncBN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)) ], mask_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), mask_head=dict( type='FCNMaskHead', num_convs=4, in_channels=256, conv_out_channels=256, num_classes=50, loss_mask=dict( type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_across_levels=False, nms_pre=2000, nms_post=2000, max_per_img=2000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=[ dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.7, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False) ]), test_cfg=dict( rpn=dict( nms_across_levels=False, nms_pre=1000, nms_post=1000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100, mask_thr_binary=0.5))) dataset_type = 'CocoDataset' data_root = 'data/coco/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[[{ 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }], [{ 'type': 'Resize', 'img_scale': [(400, 1333), (500, 1333), (600, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'override': True, 'keep_ratio': True }]]), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_train2017.json', img_prefix='data/coco/train2017/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[[{ 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }], [{ 'type': 'Resize', 'img_scale': [(400, 1333), (500, 1333), (600, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'override': True, 'keep_ratio': True }]]), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict( type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ]), val=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_val2017.json', img_prefix='data/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_test2017.json', img_prefix='data/coco/test2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(metric=['bbox', 'segm']) optimizer = dict( type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05, paramwise_cfg=dict( custom_keys=dict( absolute_pos_embed=dict(decay_mult=0.0), relative_position_bias_table=dict(decay_mult=0.0), norm=dict(decay_mult=0.0)))) optimizer_config = dict( grad_clip=None, type='DistOptimizerHook', update_interval=1, coalesce=True, bucket_size_mb=-1, use_fp16=True) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[27, 33]) runner = dict(type='EpochBasedRunnerAmp', max_epochs=36) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] fp16 = None work_dir = './work_dirs/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco' gpu_ids = range(0, 2)

2022-05-05 10:26:27,938 - mmdet - INFO - load model from: checkpoint/cascade_mask_rcnn_swin_base_patch4_window7.pth Traceback (most recent call last): File "/home/amax/anaconda3/envs/swin_det/lib/python3.8/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg return obj_cls(**args) File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmdet/models/detectors/cascade_rcnn.py", line 18, in init super(CascadeRCNN, self).init( File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmdet/models/detectors/two_stage.py", line 48, in init self.init_weights(pretrained=pretrained) File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmdet/models/detectors/two_stage.py", line 68, in init_weights self.backbone.init_weights(pretrained=pretrained) File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmdet/models/backbones/swin_transformer.py", line 594, in init_weights load_checkpoint(self, pretrained, strict=False, logger=logger) File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmcv_custom/checkpoint.py", line 340, in load_checkpoint table_current = model.state_dict()[table_key] KeyError: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "./tools/train.py", line 164, in main() File "./tools/train.py", line 132, in main model = build_detector( File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmdet/models/builder.py", line 77, in build_detector return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg)) File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmdet/models/builder.py", line 34, in build return build_from_cfg(cfg, registry, default_args) File "/home/amax/anaconda3/envs/swin_det/lib/python3.8/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') KeyError: "CascadeRCNN: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'" Traceback (most recent call last): File "/home/amax/anaconda3/envs/swin_det/lib/python3.8/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg return obj_cls(**args) File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmdet/models/detectors/cascade_rcnn.py", line 18, in init super(CascadeRCNN, self).init( File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmdet/models/detectors/two_stage.py", line 48, in init self.init_weights(pretrained=pretrained) File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmdet/models/detectors/two_stage.py", line 68, in init_weights self.backbone.init_weights(pretrained=pretrained) File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmdet/models/backbones/swin_transformer.py", line 594, in init_weights load_checkpoint(self, pretrained, strict=False, logger=logger) File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmcv_custom/checkpoint.py", line 340, in load_checkpoint table_current = model.state_dict()[table_key] KeyError: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "./tools/train.py", line 164, in main() File "./tools/train.py", line 132, in main model = build_detector( File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmdet/models/builder.py", line 77, in build_detector return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg)) File "/home/amax/LJW/swin_transformer/Swin-Transformer-Object-Detection/mmdet/models/builder.py", line 34, in build return build_from_cfg(cfg, registry, default_args) File "/home/amax/anaconda3/envs/swin_det/lib/python3.8/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') KeyError: "CascadeRCNN: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'" Killing subprocess 1386963 Killing subprocess 1386964 Traceback (most recent call last): File "/home/amax/anaconda3/envs/swin_det/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/amax/anaconda3/envs/swin_det/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/amax/anaconda3/envs/swin_det/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/home/amax/anaconda3/envs/swin_det/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/home/amax/anaconda3/envs/swin_det/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/home/amax/anaconda3/envs/swin_det/bin/python', '-u', './tools/train.py', '--local_rank=1', 'configs/swin/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py', '--launcher', 'pytorch', '--cfg-options', 'model.pretrained=checkpoint/cascade_mask_rcnn_swin_base_patch4_window7.pth']' returned non-zero exit status 1.

Freelectry commented 2 years ago

same problem dude. Have you solved it?

billfjj commented 2 years ago

same problem dude. Have you solved it?

No, I gave up using its .sh command

weiyx16 commented 2 years ago

The "--cfg-options model.pretrained" is to load classification pre-trained model weights, but not a coco fine-tuned one. Plz refer to previous discussion: Issue #4

xinlin-xiao commented 11 months ago

same problem dude. you gave up using its .sh,Can you share the command you're currently using?