ViTAE-Transformer / ViTAE-Transformer-Remote-Sensing

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP
438 stars 53 forks source link

reproduce problem about swin-t in scene classification. #24

Open Leiyi-Hu opened 1 year ago

Leiyi-Hu commented 1 year ago

Hi, I try to follow your hyperparameters to reproduce the classification results in misclassification, but I train aid (2:8) using max_epochs=200, base_lr=5e-4, and other settings following: base = [

'../base/models/swin_transformer/base_224.py',

# "../_base_/datasets/ucmerced_landuse_bs64_swin_224.py",
"../_base_/datasets/aid_bs64_autoaug.py",
"../_base_/schedules/imagenet_bs64_adamw_swin.py",
"../_base_/default_runtime.py",

]

refer to SimMIM paper

ADJUST_FACTOR = 1.0 BATCH_SIZE = 64 BASE_LR = 5e-4 ADJUST_FACTOR # todo: adjust. WARMUP_LR = 5e-7 ADJUST_FACTOR MIN_LR = 5e-6 * ADJUST_FACTOR NUM_GPUS = 1 DROP_PATH_RATE = 0.2 SCALE_FACTOR = 512.0 MAX_EPOCHS = 200

model settings

model = dict( type="ImageClassifier", backbone=dict( type="SwinTransformer",

arch="base",

    arch="tiny",
    img_size=224,
    # drop_path_rate=0.1,  # DROP_PATH_RATE
    drop_path_rate=DROP_PATH_RATE,
),
neck=dict(type="GlobalAveragePooling"),
head=dict(
    type="LinearClsHead",
    num_classes=21,
    # in_channels=1024,
    in_channels=768,
    init_cfg=None,  # suppress the default init_cfg of LinearClsHead.
    loss=dict(type="LabelSmoothLoss", label_smooth_val=0.1, mode="original"),
    cal_acc=False,
),
init_cfg=[
    dict(type="TruncNormal", layer="Linear", std=0.02, bias=0.0),
    dict(type="Constant", layer="LayerNorm", val=1.0, bias=0.0),
],
train_cfg=dict(
    augments=[
        dict(type="BatchMixup", alpha=0.8, num_classes=21, prob=0.5),
        dict(type="BatchCutMix", alpha=1.0, num_classes=21, prob=0.5),
    ]
),

)

optimizer

paramwise_cfg = dict( norm_decay_mult=0.0, bias_decay_mult=0.0, custom_keys={ ".absolute_pos_embed": dict(decay_mult=0.0), ".relative_position_bias_table": dict(decay_mult=0.0), }, )

optimizer = dict( type="AdamW",

lr=1e-3 64 / 256, # 5e-4 64 / 512, # 1e-3 * 64 / 256,

# lr=1.25e-3 * 96 * 1 / 512.0,
# BASE_LR * BATCH_SIZE * NUM_GPUS / 512.0,  # 1e-3 * 64 / 256,
lr=BASE_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR,
weight_decay=0.05,
eps=1e-8,
betas=(0.9, 0.999),
paramwise_cfg=paramwise_cfg,

) optimizer_config = dict(grad_clip=dict(max_norm=5.0))

learning policy

lr_config = dict( policy="CosineAnnealing",

min_lr=2.5e-7,

# by_epoch=False,  # todo: try
by_epoch=False,
# min_lr_ratio=(2.5e-7 * 96 * 1 / 512.0) / (1.25e-3 * 96 * 1 / 512.0),  # 1e-2,
min_lr_ratio=(MIN_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR)
/ (BASE_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR),
# min_lr=2.5e-7,  # MIN_LR,
warmup="linear",
# warmup_ratio=(2.5e-7 * 96 * 1 / 512.0) / (1.25e-3 * 96 * 1 / 512.0),  # 1e-3,
warmup_ratio=(WARMUP_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR)
/ (BASE_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR),
# warmup_lr=2.5e-7,  # WARMUP_LR,
warmup_iters=20, # todo: 0
warmup_by_epoch=True,

)

checkpoint_config = dict(interval=MAX_EPOCHS // 10) evaluation = dict( interval=MAX_EPOCHS // 10, metric="accuracy", save_best="auto" ) # save the checkpoint with highest accuracy runner = dict(type="EpochBasedRunner", max_epochs=MAX_EPOCHS)

data = dict(samples_per_gpu=96, workers_per_gpu=8,)

data = dict(samples_per_gpu=BATCH_SIZE, workers_per_gpu=8,)

fp16 settings

fp16 = dict(loss_scale="dynamic")

so could you help me with this? or provide your training log? Thanks!

DotWang commented 1 year ago

@Leiyi-Hu We use MAE, not SimMIM. In addition, our classification experiments does not use mmclassification. We have provided corresponding codes. You should reproduce with our codes.