The loss has converged at early stage?

I used the default vits-16 config to train on Imagenet1k end to end. But I found that the loss has converged to 2.492 after one epoch. Is that normal? And if so, how does the preformance improve since the loss seems not decrease any more in the next hundreds epoch? And if not, is there anything I did wrong? the config I used is as follows

criterion: ent_weight: 0.0 final_sharpen: 0.25 me_max: true memax_weight: 1.0 num_proto: 1024 start_sharpen: 0.25 temperature: 0.1 batch_size: 32 use_ent: true use_sinkhorn: true data: color_jitter_strength: 0.5 pin_mem: true num_workers: 10 image_folder: /gruntdata6/xinshulin/data/imagenet/new_train/1 label_smoothing: 0.0 patch_drop: 0.15 rand_size: 224 focal_size: 96 rand_views: 1 focal_views: 10 root_path: /gruntdata6/xinshulin/data/imagenet/new_train logging: folder: checkpoint/msn_os_logs4/ write_tag: msn-experiment-1 meta: bottleneck: 1 copy_data: false drop_path_rate: 0.0 hidden_dim: 2048 load_checkpoint: false model_name: deit_small output_dim: 256 read_checkpoint: null use_bn: true use_fp16: false use_pred_head: false optimization: clip_grad: 3.0 epochs: 800 final_lr: 1.0e-06 final_weight_decay: 0.4 lr: 0.001 start_lr: 0.0002 warmup: 15 weight_decay: 0.04

facebookresearch / msn

The loss has converged at early stage? #22