Training using swin transformer on mmdetection but the accuracy results are extremely low.

Hello everyone. I have shared the training logs. I have trainined for 40 Epochs with retinanet as well but i have the same problem.

command to run: python /app/scripts/mmdet/train.py /app/configs/mmdet/swin/mask_rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py --work-dir /app/work_dirs/mask_rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco

2023-05-24 16:19:31,026 - mmdet - INFO - 
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.002
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.002
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.007
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.020
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.012
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.033

2023-05-24 16:19:31,090 - mmdet - INFO - The previous best checkpoint /app/work_dirs/mask_rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco/best_AR@1000_epoch_1.pth was removed
2023-05-24 16:19:34,600 - mmdet - INFO - Now best checkpoint is saved as best_AR@1000_epoch_2.pth.
2023-05-24 16:19:34,600 - mmdet - INFO - Best AR@1000 is 0.0360 at 2 epoch.
2023-05-24 16:19:34,608 - mmdet - INFO - Exp name: mask_rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py
2023-05-24 16:19:34,608 - mmdet - INFO - Epoch(val) [2][203]    AR@100: 0.0020, AR@300: 0.0120, AR@1000: 0.0360, AR_s@1000: 0.0000, AR_m@1000: 0.0210, AR_l@1000: 0.0600, bbox_AR@100: 0.0020, bbox_AR@300: 0.0120, bbox_AR@1000: 0.0360, bbox_AR_s@1000: 0.0000, bbox_AR_m@1000: 0.0210, bbox_AR_l@1000: 0.0600, bbox_mAP_copypaste: 0.001 0.003 0.000 0.000 0.000 0.001, segm_AR@100: 0.0010, segm_AR@300: 0.0070, segm_AR@1000: 0.0200, segm_AR_s@1000: 0.0000, segm_AR_m@1000: 0.0120, segm_AR_l@1000: 0.0330, segm_mAP_copypaste: 0.000 0.002 0.000 0.000 0.000 0.002
2023-05-24 16:20:44,948 - mmdet - INFO - Epoch [3][50/712]      lr: 1.000e-04, eta: 7:47:31, time: 1.406, data_time: 0.311, memory: 27397, loss_rpn_cls: 0.1374, loss_rpn_bbox: 0.0869, loss_cls: 0.1031, acc: 97.2070, loss_bbox: 0.0625, loss_mask: 0.4691, loss: 0.8590
2023-05-24 16:21:42,607 - mmdet - INFO - Epoch [3][100/712]     lr: 1.000e-04, eta: 7:46:27, time: 1.153, data_time: 0.078, memory: 27397, loss_rpn_cls: 0.1245, loss_rpn_bbox: 0.0829, loss_cls: 0.0926, acc: 97.5547, loss_bbox: 0.0498, loss_mask: 0.4291, loss: 0.7790
2023-05-24 16:22:40,384 - mmdet - INFO - Epoch [3][150/712]     lr: 1.000e-04, eta: 7:45:25, time: 1.155, data_time: 0.079, memory: 27397, loss_rpn_cls: 0.1198, loss_rpn_bbox: 0.0919, loss_cls: 0.1075, acc: 96.9375, loss_bbox: 0.0734, loss_mask: 0.4224, loss: 0.8149
2023-05-24 16:23:37,919 - mmdet - INFO - Epoch [3][200/712]     lr: 1.000e-04, eta: 7:44:19, time: 1.151, data_time: 0.095, memory: 27397, loss_rpn_cls: 0.1559, loss_rpn_bbox: 0.1105, loss_cls: 0.1161, acc: 96.7734, loss_bbox: 0.0744, loss_mask: 0.4585, loss: 0.9154
2023-05-24 16:24:35,152 - mmdet - INFO - Epoch [3][250/712]     lr: 1.000e-04, eta: 7:43:10, time: 1.145, data_time: 0.064, memory: 27397, loss_rpn_cls: 0.1771, loss_rpn_bbox: 0.1148, loss_cls: 0.0927, acc: 97.4102, loss_bbox: 0.0566, loss_mask: 0.4536, loss: 0.8947
2023-05-24 16:25:36,888 - mmdet - INFO - Epoch [3][300/712]     lr: 1.000e-04, eta: 7:43:04, time: 1.235, data_time: 0.110, memory: 27397, loss_rpn_cls: 0.1352, loss_rpn_bbox: 0.1072, loss_cls: 0.1111, acc: 96.9180, loss_bbox: 0.0745, loss_mask: 0.4308, loss: 0.8588
2023-05-24 16:26:34,013 - mmdet - INFO - Epoch [3][350/712]     lr: 1.000e-04, eta: 7:41:52, time: 1.142, data_time: 0.059, memory: 27397, loss_rpn_cls: 0.1216, loss_rpn_bbox: 0.0887, loss_cls: 0.1066, acc: 97.1016, loss_bbox: 0.0646, loss_mask: 0.4499, loss: 0.8315
2023-05-24 16:27:35,963 - mmdet - INFO - Epoch [3][400/712]     lr: 1.000e-04, eta: 7:41:45, time: 1.239, data_time: 0.100, memory: 27397, loss_rpn_cls: 0.1393, loss_rpn_bbox: 0.0865, loss_cls: 0.1084, acc: 97.2930, loss_bbox: 0.0671, loss_mask: 0.4271, loss: 0.8285
2023-05-24 16:28:35,646 - mmdet - INFO - Epoch [3][450/712]     lr: 1.000e-04, eta: 7:41:06, time: 1.194, data_time: 0.083, memory: 27397, loss_rpn_cls: 0.1213, loss_rpn_bbox: 0.0906, loss_cls: 0.1073, acc: 96.8477, loss_bbox: 0.0764, loss_mask: 0.4229, loss: 0.8184
2023-05-24 16:29:36,733 - mmdet - INFO - Epoch [3][500/712]     lr: 1.000e-04, eta: 7:40:43, time: 1.222, data_time: 0.100, memory: 27397, loss_rpn_cls: 0.1532, loss_rpn_bbox: 0.0964, loss_cls: 0.1389, acc: 96.0352, loss_bbox: 0.1032, loss_mask: 0.4069, loss: 0.8987

here is some more logs when hitting the train command

root@57d2116ae6d7:/app# df -fpython /app/scripts/mmdet/train.py /app/configs/mmdet/swin/mask_rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py --work-dir /app/work_dirs/mask_rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco
df: invalid option -- 'f'
Try 'df --help' for more information.
root@57d2116ae6d7:/app# python /app/scripts/mmdet/train.py /app/configs/mmdet/swin/mask_rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py --work-dir /app/work_dirs/mask_rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco
dict_keys(['model', 'custom_imports', 'dataset_type', 'data_root', 'classes', 'train_img_prefix', 'train_ann_file', 'test_img_prefix', 'test_ann_file', 'img_norm_cfg', 'train_pipeline', 'test_pipeline', 'data', 'evaluation', 'optimizer', 'optimizer_config', 'lr_config', 'runner', 'checkpoint_config', 'log_config', 'custom_hooks', 'dist_params', 'log_level', 'load_from', 'resume_from', 'workflow', 'mlflow_tracking_uri', 'mlflow_artifact_root'])
[{'type': 'TextLoggerHook'}, {'type': 'TensorboardLoggerHook'}]
/app/mmdetection/mmdet/utils/setup_env.py:33: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  f'Setting OMP_NUM_THREADS environment variable for each process '
/app/mmdetection/mmdet/utils/setup_env.py:43: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  f'Setting MKL_NUM_THREADS environment variable for each process '
fatal: detected dubious ownership in repository at '/app'
To add an exception for this directory, call:

        git config --global --add safe.directory /app
2023-05-24 15:39:21,834 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.5 (default, Dec  9 2021, 17:04:37) [GCC 8.4.0]
CUDA available: True
GPU 0: NVIDIA RTX A6000
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.0+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.0+cu111
OpenCV: 4.7.0
MMCV: 1.4.4
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.21.0+
------------------------------------------------------------

2023-05-24 15:39:23,947 - mmdet - INFO - Distributed training: False
2023-05-24 15:39:26,155 - mmdet - INFO - Config:
model = dict(
    type='MaskRCNN',
    backbone=dict(
        type='SwinTransformer',
        embed_dims=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        mlp_ratio=4,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.2,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
        with_cp=False,
        convert_weights=True,
        in_channels=5),
    neck=dict(
        type='FPN',
        in_channels=[96, 192, 384, 768],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    roi_head=dict(
        type='StandardRoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=1,
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=dict(
            type='FCNMaskHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=1,
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=-1,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            mask_size=28,
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100,
            mask_thr_binary=0.5)))
custom_imports = dict(imports=['numpy_loader'], allow_failed_imports=False)
dataset_type = 'CocoDataset'
data_root = '/app/data/tbbr/'
classes = ('Thermal bridge', )
train_img_prefix = 'train2017/images/'
train_ann_file = 'annotations/instances_train2017.json'
test_img_prefix = 'val2017/images/'
test_ann_file = 'annotations/instances_val2017.json'
img_norm_cfg = dict(
    mean=[130.0, 135.0, 135.0, 118.0, 118.0],
    std=[44.0, 40.0, 40.0, 30.0, 21.0],
    to_rgb=False)
train_pipeline = [
    dict(type='LoadNumpyImageFromFile', drop_height=False),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='AutoAugment',
        policies=[[{
            'type':
            'Resize',
            'img_scale':
            [(1608, 3370), (1715, 3370), (1822, 3370), (1929, 3370),
             (2036, 3370), (2144, 3370), (2251, 3370), (2358, 3370),
             (2465, 3370), (2572, 3370), (2680, 3370)],
            'multiscale_mode':
            'value',
            'keep_ratio':
            True
        }],
                  [{
                      'type': 'Resize',
                      'img_scale': [(1340, 3370), (1675, 3370), (2010, 3370)],
                      'multiscale_mode': 'value',
                      'keep_ratio': True
                  }, {
                      'type': 'RandomCrop',
                      'crop_type': 'absolute_range',
                      'crop_size': (1286, 2010),
                      'allow_negative_crop': True
                  }, {
                      'type':
                      'Resize',
                      'img_scale': [(1608, 3370), (1715, 3370), (1822, 3370),
                                    (1929, 3370), (2036, 3370), (2144, 3370),
                                    (2251, 3370), (2358, 3370), (2465, 3370),
                                    (2572, 3370), (2680, 3370)],
                      'multiscale_mode':
                      'value',
                      'override':
                      True,
                      'keep_ratio':
                      True
                  }]]),
    dict(
        type='Normalize',
        mean=[130.0, 135.0, 135.0, 118.0, 118.0],
        std=[44.0, 40.0, 40.0, 30.0, 21.0],
        to_rgb=False),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
    dict(type='LoadNumpyImageFromFile', drop_height=False),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(3370, 2680),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[130.0, 135.0, 135.0, 118.0, 118.0],
                std=[44.0, 40.0, 40.0, 30.0, 21.0],
                to_rgb=False),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=1,
    workers_per_gpu=32,
    train=dict(
        type='CocoDataset',
        img_prefix='/app/data/tbbr/train2017/images/',
        ann_file='/app/data/tbbr/annotations/instances_train2017.json',
        pipeline=[
            dict(type='LoadNumpyImageFromFile', drop_height=False),
            dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='AutoAugment',
                policies=[[{
                    'type':
                    'Resize',
                    'img_scale': [(1608, 3370), (1715, 3370), (1822, 3370),
                                  (1929, 3370), (2036, 3370), (2144, 3370),
                                  (2251, 3370), (2358, 3370), (2465, 3370),
                                  (2572, 3370), (2680, 3370)],
                    'multiscale_mode':
                    'value',
                    'keep_ratio':
                    True
                }],
                          [{
                              'type':
                              'Resize',
                              'img_scale': [(1340, 3370), (1675, 3370),
                                            (2010, 3370)],
                              'multiscale_mode':
                              'value',
                              'keep_ratio':
                              True
                          }, {
                              'type': 'RandomCrop',
                              'crop_type': 'absolute_range',
                              'crop_size': (1286, 2010),
                              'allow_negative_crop': True
                          }, {
                              'type':
                              'Resize',
                              'img_scale': [(1608, 3370), (1715, 3370),
                                            (1822, 3370), (1929, 3370),
                                            (2036, 3370), (2144, 3370),
                                            (2251, 3370), (2358, 3370),
                                            (2465, 3370), (2572, 3370),
                                            (2680, 3370)],
                              'multiscale_mode':
                              'value',
                              'override':
                              True,
                              'keep_ratio':
                              True
                          }]]),
            dict(
                type='Normalize',
                mean=[130.0, 135.0, 135.0, 118.0, 118.0],
                std=[44.0, 40.0, 40.0, 30.0, 21.0],
                to_rgb=False),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
        ],
        classes=('Thermal bridge', )),
    val=dict(
        type='CocoDataset',
        img_prefix='/app/data/tbbr/val2017/images/',
        ann_file='/app/data/tbbr/annotations/instances_val2017.json',
        pipeline=[
            dict(type='LoadNumpyImageFromFile', drop_height=False),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(3370, 2680),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[130.0, 135.0, 135.0, 118.0, 118.0],
                        std=[44.0, 40.0, 40.0, 30.0, 21.0],
                        to_rgb=False),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        classes=('Thermal bridge', )),
    test=dict(
        type='CocoDataset',
        img_prefix='/app/data/tbbr/val2017/images/',
        ann_file='/app/data/tbbr/annotations/instances_val2017.json',
        pipeline=[
            dict(type='LoadNumpyImageFromFile', drop_height=False),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(3370, 2680),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[130.0, 135.0, 135.0, 118.0, 118.0],
                        std=[44.0, 40.0, 40.0, 30.0, 21.0],
                        to_rgb=False),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        classes=('Thermal bridge', )))
evaluation = dict(
    interval=1,
    metric=['proposal', 'bbox', 'segm'],
    proposal_nums=[1, 10, 100],
    save_best='AR@1000')
optimizer = dict(
    type='AdamW',
    lr=0.0001,
    betas=(0.9, 0.999),
    weight_decay=0.05,
    paramwise_cfg=dict(
        custom_keys=dict(
            absolute_pos_embed=dict(decay_mult=0.0),
            relative_position_bias_table=dict(decay_mult=0.0),
            norm=dict(decay_mult=0.0))))
optimizer_config = dict(grad_clip=None)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[27, 33])
runner = dict(type='EpochBasedRunner', max_epochs=36)
checkpoint_config = dict(interval=1)
log_config = dict(
    interval=50,
    hooks=[dict(type='TextLoggerHook'),
           dict(type='TensorboardLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
mlflow_tracking_uri = 'sqlite:////path/to/mlruns.db'
mlflow_artifact_root = '/path/to/mlartifacts/'
work_dir = '/app/work_dirs/mask_rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco'
auto_resume = False
gpu_ids = [0]

2023-05-24 15:39:26,156 - mmdet - INFO - Set random seed to 841100096, deterministic: False
2023-05-24 15:39:26,447 - mmdet - WARNING - No pre-trained weights for SwinTransformer, training start from scratch
2023-05-24 15:39:26,586 - mmdet - INFO - initialize FPN with init_cfg {'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'}
2023-05-24 15:39:26,603 - mmdet - INFO - initialize RPNHead with init_cfg {'type': 'Normal', 'layer': 'Conv2d', 'std': 0.01}
2023-05-24 15:39:26,607 - mmdet - INFO - initialize Shared2FCBBoxHead with init_cfg [{'type': 'Normal', 'std': 0.01, 'override': {'name': 'fc_cls'}}, {'type': 'Normal', 'std': 0.001, 'override': {'name': 'fc_reg'}}, {'type': 'Xavier', 'distribution': 'uniform', 'override': [{'name': 'shared_fcs'}, {'name': 'cls_fcs'}, {'name': 'reg_fcs'}]}]
loading annotations into memory...
Done (t=0.04s)
creating index...
index created!
[
CocoDataset Train dataset with number of images 712, and instance counts:
+--------------------+-------+----------+-------+----------+-------+----------+-------+----------+-------+
| category           | count | category | count | category | count | category | count | category | count |
+--------------------+-------+----------+-------+----------+-------+----------+-------+----------+-------+
|                    |       |          |       |          |       |          |       |          |       |
| 0 [Thermal bridge] | 5614  |          |       |          |       |          |       |          |       |
+--------------------+-------+----------+-------+----------+-------+----------+-------+----------+-------+]
fatal: detected dubious ownership in repository at '/app'
To add an exception for this directory, call:

        git config --global --add safe.directory /app
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
2023-05-24 15:39:30,687 - mmdet - INFO - Start running, host: root@57d2116ae6d7, work_dir: /app/work_dirs/mask_rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco2023-05-24 15:39:30,688 - mmdet - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) StepLrUpdaterHook
(NORMAL      ) CheckpointHook
(LOW         ) EvalHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_train_epoch:
(VERY_HIGH   ) StepLrUpdaterHook
(NORMAL      ) NumClassCheckHook
(LOW         ) IterTimerHook
(LOW         ) EvalHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_train_iter:
(VERY_HIGH   ) StepLrUpdaterHook
(LOW         ) IterTimerHook
(LOW         ) EvalHook
 --------------------
after_train_iter:
(ABOVE_NORMAL) OptimizerHook
(NORMAL      ) CheckpointHook
(LOW         ) IterTimerHook
(LOW         ) EvalHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
after_train_epoch:
(NORMAL      ) CheckpointHook
(LOW         ) EvalHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_val_epoch:
(NORMAL      ) NumClassCheckHook
(LOW         ) IterTimerHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_val_iter:
(LOW         ) IterTimerHook
 --------------------
after_val_iter:
(LOW         ) IterTimerHook
 --------------------
after_val_epoch:
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
after_run:
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook

Helmholtz-AI-Energy / TBBRDet

Training using swin transformer on mmdetection but the accuracy results are extremely low. #2