Severely overfitting occurred.

Dear author: I trained a lite-base version of video swin transformer, but I noticed very severely overfitting phonomenon occurred as :

, data_time: 0.001, memory: 20882, top1_acc: 0.7600, top5_acc: 0.9206, loss_cls: 0.9247, loss: 0.9247
2022-02-15 10:24:18,650 - mmaction - INFO - Epoch [13][5860/5929]   lr: 2.714e-05, eta: 2 days, 3:27:33, time: 0.669, data_time: 0.001, memory: 20882, top1_acc: 0.7569, top5_acc: 0.9269, loss_cls: 0.9281, loss: 0.9281
2022-02-15 10:24:31,952 - mmaction - INFO - Epoch [13][5880/5929]   lr: 2.714e-05, eta: 2 days, 3:27:20, time: 0.664, data_time: 0.000, memory: 20882, top1_acc: 0.7462, top5_acc: 0.9313, loss_cls: 0.9472, loss: 0.9472
2022-02-15 10:24:45,297 - mmaction - INFO - Epoch [13][5900/5929]   lr: 2.714e-05, eta: 2 days, 3:27:07, time: 0.668, data_time: 0.001, memory: 20882, top1_acc: 0.7556, top5_acc: 0.9250, loss_cls: 0.9117, loss: 0.9117
2022-02-15 10:24:58,546 - mmaction - INFO - Epoch [13][5920/5929]   lr: 2.714e-05, eta: 2 days, 3:26:53, time: 0.662, data_time: 0.001, memory: 20882, top1_acc: 0.7506, top5_acc: 0.9256, loss_cls: 0.9624, loss: 0.9624
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 33663/33663, 139.1 task/s, elapsed: 242s, ETA:     0s

2022-02-15 10:29:10,037 - mmaction - INFO - Evaluating top_k_accuracy ...
2022-02-15 10:29:12,502 - mmaction - INFO - 
top1_acc    0.5948
top5_acc    0.8161
2022-02-15 10:29:12,502 - mmaction - INFO - Evaluating mean_class_accuracy ...
2022-02-15 10:29:12,608 - mmaction - INFO - 
mean_acc    0.5943
2022-02-15 10:29:12,626 - mmaction - INFO - Epoch(val) [13][421]    top1_acc: 0.5948, top5_acc: 0.8161, mean_class_accuracy: 0.5943

after i trained for 30 epochs, the training top1 reached 90+%, but the validation acc keep ~59% still.

I follow most of the setting as swin-base :

        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.2,
        patch_norm=True),

    cls_head=dict(
        type='I3DHead',
        in_channels=1024,
        num_classes=700,
        spatial_type='avg',
        dropout_ratio=0.5),

# optimizer
optimizer = dict(type='AdamW', lr=3e-4, betas=(0.9, 0.999), weight_decay=0.05,
                 paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
                                                 'relative_position_bias_table': dict(decay_mult=0.),
                                                 'norm': dict(decay_mult=0.),
                                                 'backbone': dict(lr_mult=0.1)})

Anyone has the same case? could anyone give some tips? thank you.

SwinTransformer / Video-Swin-Transformer

Severely overfitting occurred. #51