NJU-LHRS / official-CMID

The official implementation of paper "Unified Self-Supervised Learning Framework for Remote Sensing Images".
78 stars 5 forks source link

Question About Discrepancy Between My Semantic Segmentation Results and Expected Performance #17

Closed sskim0126 closed 8 months ago

sskim0126 commented 9 months ago

Hello, I would like to attempt semantic segmentation on the Potsdam dataset using your approach. I have followed all the instructions you provided, but when I conducted tests, I found that the performance was slightly lower than what is indicated.

The performance metrics I obtained are as follows: Metrics from GitHub

Metrics from my own experiments

I would like to know if I made any mistakes that could have led to the lower performance.

For reference, here is the environment in which I conducted the experiments:

pUmpKin-Co commented 9 months ago

Hi~.Thanks for you interest for our work. Could you provide your log and config files for better analysis the problem?

sskim0126 commented 9 months ago

Thank you for your response!

First of all, config files are identical to the files provided on your GitHub repository, except for the changed paths.

Here are the config files we used.

optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)

optimizer_config = dict(type='Fp16OptimizerHook', distributed=False)

lr_config = dict( policy='CosineAnnealing', warmup='linear', warmup_iters=100, min_lr=0.000001, by_epoch=False)

runner = dict(type='EpochBasedRunner', max_epochs=50) checkpoint_config = dict(by_epoch=True, interval=1) evaluation = dict(interval=5, metric=['mIoU', "mFscore"], pre_eval=True, by_epoch=True)

norm_cfg = dict(type='BN', requires_grad=True) checkpoint_path = "./pretrained/CMID-ResNet50-millionAID.pth" model = dict( type='EncoderDecoder', pretrained=checkpoint_path, backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), dilations=(1, 1, 1, 1), strides=(1, 2, 2, 2), norm_cfg=dict(type='BN', requires_grad=True), norm_eval=False, style='pytorch', contract_dilation=True), decode_head=dict( type='UPerHead', in_channels=[256, 512, 1024, 2048], in_index=[0, 1, 2, 3], pool_scales=(1, 2, 3, 6), channels=512, ignore_index=255, dropout_ratio=0.1, num_classes=6, norm_cfg=dict(type='BN', requires_grad=True), align_corners=False, loss_decode=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), auxiliary_head=dict( type='FCNHead', in_channels=1024, in_index=2, channels=256, ignore_index=255, num_convs=1, concat_input=False, dropout_ratio=0.1, num_classes=6, norm_cfg=dict(type='BN', requires_grad=True), align_corners=False, loss_decode=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), train_cfg=dict(), test_cfg=dict(mode='whole'))

dataset_type = 'PotsdamAllDataset'

dataset_type="PotsdamDataset" data_root = 'data/potsdam' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) crop_size = (512, 512) train_pipeline = [ dict(type='LoadImageFromFile'),

dict(type='LoadAnnotations', reduce_zero_label=True),

dict(type='LoadAnnotationsReduceIgnoreIndex', reduce_zero_label=True, ignore_index=6),
dict(type='Resize', img_scale=(512, 512), ratio_range=(0.5, 2.0)),
dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
dict(type='RandomFlip', prob=0.5),
dict(type='PhotoMetricDistortion'),
dict(
    type='Normalize',
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=True),
dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_semantic_seg'])

] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(512, 512), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=8, workers_per_gpu=4, train=dict( type=dataset_type, data_root=data_root, img_dir='img_dir/train', ann_dir='ann_dir/train', pipeline=train_pipeline), val=dict( type=dataset_type, data_root=data_root, img_dir='img_dir/val', ann_dir='ann_dir/val', pipeline=test_pipeline), test=dict( type=dataset_type, data_root=data_root, img_dir='img_dir/val', ann_dir='ann_dir/val', pipeline=test_pipeline))


* CMID-Swin
```python
# model settings
# checkpoint_file = "./pretrained/CMID_Swin-B_bk_200ep.pth"
checkpoint_file = "./pretrained/CMID-Swin-B-millionAID.pth"
norm_cfg = dict(type='SyncBN', requires_grad=True)
backbone_norm_cfg = dict(type='LN', requires_grad=True)
model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(
        type='SwinTransformer',
        pretrain_img_size=224,
        embed_dims=128,
        patch_size=4,
        window_size=7,
        mlp_ratio=4,
        depths=[2, 2, 18, 2],
        num_heads=[4, 8, 16, 32],
        strides=(4, 2, 2, 2),
        out_indices=(0, 1, 2, 3),
        qkv_bias=True,
        qk_scale=None,
        patch_norm=True,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.3,
        use_abs_pos_embed=False,
        act_cfg=dict(type='GELU'),
        init_cfg=dict(type='Pretrained', checkpoint=checkpoint_file),
        norm_cfg=backbone_norm_cfg),
    decode_head=dict(
        type='UPerHead',
        in_channels=[128, 256, 512, 1024],
        in_index=[0, 1, 2, 3],
        pool_scales=(1, 2, 3, 6),
        channels=512,
        dropout_ratio=0.1,
        num_classes=6,
        ignore_index=255,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=512,
        in_index=2,
        channels=256,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=6,
        ignore_index=255,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
    # model training and testing settings
    train_cfg=dict(),
    test_cfg=dict(mode='whole'))

log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook', by_epoch=False),
        dict(type='TensorboardLoggerHook')
    ])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
dataset_type = 'PotsdamDataset'
data_root = 'data/potsdam'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (512, 512)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='LoadAnnotationsReduceIgnoreIndex',
        reduce_zero_label=True,
        ignore_index=6),
    dict(type='Resize', img_scale=(512, 512), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
val_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(512, 512),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=512),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=None,
        img_ratios=[1.0],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='img_dir/train',
        ann_dir='ann_dir/train',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='img_dir/val',
        ann_dir='ann_dir/val',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='img_dir/val',
        ann_dir='ann_dir/val',
        pipeline=test_pipeline))

optimizer_config = None

optimizer = dict(
    type='AdamW',
    lr=0.00006,
    betas=(0.9, 0.999),
    weight_decay=0.01,
    paramwise_cfg=dict(
        custom_keys={
            'absolute_pos_embed': dict(decay_mult=0.),
            'relative_position_bias_table': dict(decay_mult=0.),
            'norm': dict(decay_mult=0.)
        }))

lr_config = dict(
    policy='CosineAnnealing',
    warmup='linear',
    warmup_iters=100,
    min_lr=0.000001,
    by_epoch=False)

checkpoint_config = dict(by_epoch=True, interval=1)
evaluation = dict(
    interval=5, metric=['mIoU', 'mFscore'], pre_eval=True, by_epoch=True)
runner = dict(type='EpochBasedRunner', max_epochs=50)
gpu_ids = [0]
auto_resume = False

And here are the log json files

pUmpKin-Co commented 9 months ago

Hi~ @sskim0126 Try the following for reproducing similar results.

ResNet50 ```python optimizer = dict(constructor='TimmConstructor',type='Adan', lr=0.0020, weight_decay=0.02) optimizer_config = dict(type='Fp16OptimizerHook', distributed=False, grad_clip=dict(max_norm=5.0, norm_type=2)) ```
SwinB ```python optimizer = dict(constructor='TimmConstructor',type='Adan', lr=0.0003, weight_decay=0.02, filter_bias_and_bn=False, paramwise_cfg=dict( custom_keys={ 'absolute_pos_embed': dict(decay_mult=0.), 'relative_position_bias_table': dict(decay_mult=0.), 'norm': dict(decay_mult=0.)})) optimizer_config = dict(type='Fp16OptimizerHook', distributed=False) ```

For your reference, the following is my evaluation result after training 50 epochs (you can achieve the best result by adding --aug-test with mmseg test.py script. The following result didn't add --aug-test since I lost the added results :( ):

CMID-Swin-B ```json {"aAcc": 0.9299, "mIoU": 0.8736, "mAcc": 0.9295, "mFscore": 0.9312, "mPrecision": 0.9332, "mRecall": 0.9295, "mFWIoU": 0.8716, "IoU.impervious_surface": 0.8999, "IoU.building": 0.9483, "IoU.low_vegetation": 0.7963, "IoU.tree": 0.794, "IoU.car": 0.9294, "Acc.impervious_surface": 0.9377, "Acc.building": 0.975, "Acc.low_vegetation": 0.908, "Acc.tree": 0.8749, "Acc.car": 0.9521, "Freq.impervious_surface": 0.3279, "Freq.building": 0.2603, "Freq.low_vegetation": 0.2156, "Freq.tree": 0.1797, "Freq.car": 0.0166, "Fscore.impervious_surface": 0.9473, "Fscore.building": 0.9734, "Fscore.low_vegetation": 0.8866, "Fscore.tree": 0.8851, "Fscore.car": 0.9634, "Precision.impervious_surface": 0.9571, "Precision.building": 0.9719, "Precision.low_vegetation": 0.8662, "Precision.tree": 0.8956, "Precision.car": 0.975, "Recall.impervious_surface": 0.9377, "Recall.building": 0.975, "Recall.low_vegetation": 0.908, "Recall.tree": 0.8749, "Recall.car": 0.9521} ```
CMID-Resnet50 ```json {"aAcc": 0.9277, "mIoU": 0.8704, "mAcc": 0.9282, "mFscore": 0.9293, "mPrecision": 0.9307, "mRecall": 0.9282, "mFWIoU": 0.8678, "IoU.impervious_surface": 0.8951, "IoU.building": 0.9476, "IoU.low_vegetation": 0.7916, "IoU.tree": 0.7879, "IoU.car": 0.9298, "Acc.impervious_surface": 0.9368, "Acc.building": 0.9745, "Acc.low_vegetation": 0.9073, "Acc.tree": 0.865, "Acc.car": 0.9575, "Freq.impervious_surface": 0.3279, "Freq.building": 0.2603, "Freq.low_vegetation": 0.2156, "Freq.tree": 0.1797, "Freq.car": 0.0166, "Fscore.impervious_surface": 0.9446, "Fscore.building": 0.9731, "Fscore.low_vegetation": 0.8837, "Fscore.tree": 0.8813, "Fscore.car": 0.9636, "Precision.impervious_surface": 0.9526, "Precision.building": 0.9717, "Precision.low_vegetation": 0.8612, "Precision.tree": 0.8983, "Precision.car": 0.9698, "Recall.impervious_surface": 0.9368, "Recall.building": 0.9745, "Recall.low_vegetation": 0.9073, "Recall.tree": 0.865, "Recall.car": 0.9575} ```

Finally, I have updated the instructions for semantic segmention tasks.

Thanks!

pUmpKin-Co commented 8 months ago

Closed as long periods of inactivity, feel free to reopen if there is any problem.