Closed Ale0311 closed 7 months ago
I do not know the reason temporarily, but can you compare this with the log I have released (https://github.com/MzeroMiko/VMamba/releases/tag/%2320240223)?
Yeah, so there are a few minor differences: Left is yours and right is mine.
Do you think this could be the problem?
I may not so sure that the right is right though, the left is wrong, as we've updated our code since then. So have you tested the performance of the released checkpoint?
Hello,
Thanks for your answer!
So I just did that now, and here are the results:
+---------------------+-------+-------+ | Class | IoU | Acc | +---------------------+-------+-------+ | wall | 78.18 | 88.97 | | building | 82.47 | 91.88 | | sky | 94.35 | 97.39 | | floor | 82.25 | 90.88 | | tree | 74.89 | 88.36 | | ceiling | 85.0 | 93.3 | | road | 84.22 | 91.78 | | bed | 89.31 | 96.09 | | windowpane | 62.48 | 78.01 | | grass | 68.37 | 84.42 | | cabinet | 60.76 | 74.21 | | sidewalk | 64.38 | 78.05 | | person | 81.64 | 92.79 | | earth | 36.55 | 49.27 | | door | 49.78 | 62.67 | | table | 60.77 | 76.14 | | mountain | 63.89 | 77.81 | | plant | 52.45 | 62.94 | | curtain | 75.61 | 85.89 | | chair | 60.61 | 73.04 | | car | 84.46 | 90.64 | | water | 57.22 | 71.67 | | painting | 74.24 | 88.28 | | sofa | 69.67 | 85.0 | | shelf | 41.6 | 59.66 | | house | 32.11 | 47.86 | | sea | 68.42 | 92.12 | | mirror | 67.72 | 75.71 | | rug | 66.08 | 76.03 | | field | 27.01 | 41.64 | | armchair | 47.34 | 64.52 | | seat | 61.21 | 81.9 | | fence | 45.84 | 60.93 | | desk | 51.29 | 72.07 | | rock | 45.34 | 68.97 | | wardrobe | 43.15 | 62.45 | | lamp | 63.79 | 75.53 | | bathtub | 80.88 | 86.87 | | railing | 35.08 | 50.69 | | cushion | 59.28 | 74.9 | | base | 32.23 | 44.27 | | box | 26.34 | 31.51 | | column | 46.59 | 54.36 | | signboard | 39.19 | 52.45 | | chest of drawers | 46.15 | 59.93 | | counter | 25.4 | 41.64 | | sand | 55.39 | 73.5 | | sink | 74.6 | 81.83 | | skyscraper | 45.52 | 59.61 | | fireplace | 74.78 | 89.7 | | refrigerator | 76.39 | 84.04 | | grandstand | 41.19 | 80.62 | | path | 16.57 | 26.32 | | stairs | 27.76 | 33.09 | | runway | 72.72 | 94.63 | | case | 48.13 | 60.94 | | pool table | 92.91 | 96.91 | | pillow | 61.96 | 72.92 | | screen door | 66.61 | 76.81 | | stairway | 26.68 | 38.42 | | river | 11.58 | 20.42 | | bridge | 45.6 | 53.9 | | bookcase | 38.41 | 63.58 | | blind | 47.3 | 50.16 | | coffee table | 55.04 | 83.73 | | toilet | 85.19 | 90.99 | | flower | 42.84 | 63.83 | | book | 48.25 | 65.33 | | hill | 14.43 | 21.64 | | bench | 52.43 | 62.5 | | countertop | 52.46 | 74.58 | | stove | 76.12 | 82.07 | | palm | 50.92 | 68.99 | | kitchen island | 44.82 | 75.91 | | computer | 66.65 | 76.73 | | swivel chair | 40.55 | 59.38 | | boat | 48.01 | 52.15 | | bar | 28.13 | 34.14 | | arcade machine | 68.81 | 74.48 | | hovel | 19.41 | 26.33 | | bus | 91.52 | 96.38 | | towel | 68.16 | 78.22 | | light | 56.98 | 66.03 | | truck | 38.78 | 49.9 | | tower | 29.48 | 46.79 | | chandelier | 64.27 | 77.95 | | awning | 32.15 | 38.27 | | streetlight | 27.65 | 35.67 | | booth | 52.98 | 57.78 | | television receiver | 67.42 | 78.91 | | airplane | 59.33 | 64.99 | | dirt track | 29.69 | 53.44 | | apparel | 37.69 | 49.58 | | pole | 24.93 | 32.86 | | land | 2.62 | 5.23 | | bannister | 16.34 | 22.39 | | escalator | 32.58 | 40.89 | | ottoman | 52.72 | 66.55 | | bottle | 36.58 | 63.77 | | buffet | 35.09 | 40.28 | | poster | 25.5 | 32.73 | | stage | 17.78 | 26.75 | | van | 45.16 | 62.09 | | ship | 64.35 | 92.58 | | fountain | 19.22 | 21.75 | | conveyer belt | 86.75 | 91.41 | | canopy | 47.17 | 56.92 | | washer | 73.77 | 74.49 | | plaything | 34.93 | 58.57 | | swimming pool | 65.92 | 71.04 | | stool | 43.31 | 63.89 | | barrel | 58.63 | 74.1 | | basket | 34.57 | 48.81 | | waterfall | 59.6 | 79.69 | | tent | 95.61 | 98.4 | | bag | 15.19 | 17.68 | | minibike | 71.97 | 86.34 | | cradle | 76.13 | 96.12 | | oven | 54.87 | 77.13 | | ball | 36.98 | 66.04 | | food | 47.73 | 58.25 | | step | 13.93 | 16.32 | | tank | 55.04 | 58.96 | | trade name | 30.45 | 35.13 | | microwave | 82.79 | 88.52 | | pot | 47.39 | 56.73 | | animal | 41.09 | 41.58 | | bicycle | 56.37 | 79.31 | | lake | 58.0 | 62.82 | | dishwasher | 67.42 | 72.35 | | screen | 55.17 | 73.12 | | blanket | 8.93 | 10.96 | | sculpture | 65.95 | 84.45 | | hood | 65.65 | 74.49 | | sconce | 46.67 | 56.45 | | vase | 45.03 | 61.9 | | traffic light | 33.43 | 59.94 | | tray | 9.88 | 15.64 | | ashcan | 37.75 | 50.26 | | fan | 62.6 | 74.28 | | pier | 47.56 | 62.0 | | crt screen | 1.36 | 4.07 | | plate | 55.3 | 71.56 | | monitor | 7.48 | 10.96 | | bulletin board | 46.42 | 54.25 | | shower | 4.99 | 7.97 | | radiator | 68.34 | 76.12 | | glass | 15.07 | 16.49 | | clock | 34.5 | 40.77 | | flag | 51.23 | 55.7 | +---------------------+-------+-------+ 03/06 22:31:44 - mmengine - INFO - Iter(test) [2000/2000] aAcc: 83.4700 mIoU: 50.7400 mAcc: 62.7300 data_time: 0.0009 time: 0.1040
But now I noticed this error in the console, which is not in the logs:
Failed loading checkpoint form ../../ckpts/classification/outs/vssm/vssmbasedp05/vssmbase_dp05_ckpt_epoch_260.pth: [Errno 2] No such file or directory: '../../ckpts/classification/outs/vssm/vssmbasedp05/vssmbase_dp05_ckpt_epoch_260.pth'
Does this mean that during training, it is not using the pretrained classification backbone weights? Could this be the reason I get so low numbers?
And if this is the case, where can I download them from?
Yes, checkout for this release (https://github.com/MzeroMiko/VMamba/releases/tag/%2320240218) and you will find that.
By the way, the performance of base model with droppath0.6 is higher than droppath0.5.
Also we are about to release new models with the lastest code in weeks, which is faster yet with higher performance.
Hello,
I tried to reproduce the results that you obtained on ade20k, for your base model. Unfortunately, my result was much much lower. This is what I obtained after a full training:
+---------------------+-------+-------+ | Class | IoU | Acc | +---------------------+-------+-------+ | wall | 66.35 | 79.36 | | building | 75.3 | 89.03 | | sky | 92.17 | 96.16 | | floor | 67.39 | 80.91 | | tree | 66.17 | 81.28 | | ceiling | 75.51 | 87.39 | | road | 75.14 | 83.8 | | bed | 71.78 | 91.77 | | windowpane | 47.48 | 67.64 | | grass | 60.37 | 74.19 | | cabinet | 45.03 | 61.32 | | sidewalk | 50.39 | 70.52 | | person | 56.72 | 81.07 | | earth | 28.9 | 43.98 | | door | 25.0 | 34.39 | | table | 34.16 | 47.35 | | mountain | 44.53 | 59.85 | | plant | 42.24 | 58.11 | | curtain | 51.35 | 66.29 | | chair | 33.7 | 55.61 | | car | 66.0 | 87.62 | | water | 34.09 | 47.8 | | painting | 49.62 | 73.7 | | sofa | 41.11 | 70.42 | | shelf | 25.59 | 44.74 | | house | 33.73 | 67.84 | | sea | 49.93 | 78.94 | | mirror | 37.82 | 42.41 | | rug | 34.03 | 44.97 | | field | 23.6 | 53.13 | | armchair | 14.56 | 26.03 | | seat | 36.58 | 63.64 | | fence | 16.5 | 24.76 | | desk | 23.31 | 38.15 | | rock | 22.14 | 46.74 | | wardrobe | 38.45 | 59.86 | | lamp | 34.59 | 45.78 | | bathtub | 36.22 | 62.54 | | railing | 15.7 | 19.51 | | cushion | 18.88 | 23.38 | | base | 5.64 | 9.17 | | box | 1.39 | 1.52 | | column | 20.08 | 24.44 | | signboard | 14.85 | 18.0 | | chest of drawers | 26.55 | 50.82 | | counter | 17.66 | 23.19 | | sand | 23.24 | 36.04 | | sink | 30.56 | 52.17 | | skyscraper | 52.38 | 78.83 | | fireplace | 43.74 | 75.89 | | refrigerator | 35.58 | 75.34 | | grandstand | 29.48 | 49.91 | | path | 11.34 | 15.89 | | stairs | 19.69 | 24.25 | | runway | 46.86 | 64.59 | | case | 34.52 | 63.94 | | pool table | 69.14 | 95.82 | | pillow | 28.72 | 35.47 | | screen door | 28.67 | 39.91 | | stairway | 20.72 | 26.45 | | river | 5.98 | 16.59 | | bridge | 11.23 | 15.61 | | bookcase | 21.8 | 44.74 | | blind | 2.32 | 2.38 | | coffee table | 30.51 | 68.45 | | toilet | 55.51 | 80.55 | | flower | 15.51 | 26.73 | | book | 30.47 | 39.04 | | hill | 5.75 | 7.07 | | bench | 22.96 | 39.48 | | countertop | 14.39 | 18.35 | | stove | 36.09 | 57.81 | | palm | 17.52 | 18.9 | | kitchen island | 14.88 | 35.55 | | computer | 26.18 | 49.09 | | swivel chair | 15.98 | 26.23 | | boat | 21.0 | 50.83 | | bar | 12.02 | 12.62 | | arcade machine | 8.28 | 18.43 | | hovel | 4.48 | 5.38 | | bus | 53.29 | 70.39 | | towel | 12.17 | 16.79 | | light | 34.44 | 39.51 | | truck | 0.99 | 1.49 | | tower | 4.98 | 5.72 | | chandelier | 41.89 | 59.44 | | awning | 6.53 | 6.97 | | streetlight | 5.24 | 6.16 | | booth | 0.0 | 0.0 | | television receiver | 30.18 | 43.7 | | airplane | 32.07 | 53.05 | | dirt track | 0.58 | 0.67 | | apparel | 14.07 | 25.11 | | pole | 5.74 | 7.03 | | land | 4.29 | 4.51 | | bannister | 0.0 | 0.0 | | escalator | 19.87 | 42.17 | | ottoman | 0.92 | 0.93 | | bottle | 0.0 | 0.0 | | buffet | 0.0 | 0.0 | | poster | 0.0 | 0.0 | | stage | 0.86 | 0.87 | | van | 2.11 | 2.22 | | ship | 0.0 | 0.0 | | fountain | 0.17 | 0.21 | | conveyer belt | 3.16 | 3.63 | | canopy | 0.0 | 0.0 | | washer | 22.3 | 47.23 | | plaything | 0.17 | 0.18 | | swimming pool | 20.65 | 47.63 | | stool | 0.0 | 0.0 | | barrel | 0.0 | 0.0 | | basket | 0.22 | 0.22 | | waterfall | 41.74 | 59.31 | | tent | 64.25 | 95.05 | | bag | 0.0 | 0.0 | | minibike | 22.6 | 30.34 | | cradle | 28.02 | 85.0 | | oven | 9.14 | 10.16 | | ball | 13.25 | 24.11 | | food | 21.4 | 29.97 | | step | 0.0 | 0.0 | | tank | 0.0 | 0.0 | | trade name | 8.74 | 8.92 | | microwave | 24.19 | 30.57 | | pot | 10.22 | 11.53 | | animal | 5.01 | 5.08 | | bicycle | 13.19 | 14.38 | | lake | 0.0 | 0.0 | | dishwasher | 10.81 | 11.96 | | screen | 62.43 | 81.01 | | blanket | 0.0 | 0.0 | | sculpture | 0.0 | 0.0 | | hood | 13.21 | 14.82 | | sconce | 4.29 | 4.4 | | vase | 10.6 | 12.4 | | traffic light | 1.78 | 1.79 | | tray | 0.0 | 0.0 | | ashcan | 6.57 | 7.72 | | fan | 15.97 | 18.69 | | pier | 0.1 | 0.11 | | crt screen | 0.0 | 0.0 | | plate | 0.73 | 0.73 | | monitor | 0.43 | 0.5 | | bulletin board | 0.0 | 0.0 | | shower | 0.0 | 0.0 | | radiator | 10.35 | 10.39 | | glass | 0.0 | 0.0 | | clock | 0.01 | 0.01 | | flag | 0.0 | 0.0 | +---------------------+-------+-------+ 2024/03/05 15:45:42 - mmengine - INFO - Iter(val) [2000/2000] aAcc: 72.4200 mIoU: 22.7400 mAcc: 33.2100 data_time: 0.0009 time: 0.0706
This is my config file
backbone_norm_cfg = dict(requires_grad=True, type='LN') checkpoint_file = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/swin/swin_base_patch4_window7_224_20220317-e9b98025.pth' crop_size = ( 512, 512, ) data_preprocessor = dict( bgr_to_rgb=True, mean=[ 123.675, 116.28, 103.53, ], pad_val=0, seg_pad_val=255, size=( 512, 512, ), std=[ 58.395, 57.12, 57.375, ], type='SegDataPreProcessor') data_root = '/home/alexandra/Documents/VMamba/data/ade/ADEChallengeData2016' dataset_type = 'ADE20KDataset' default_hooks = dict( checkpoint=dict(by_epoch=False, interval=16000, type='CheckpointHook'), logger=dict(interval=50, log_metric_by_epoch=False, type='LoggerHook'), param_scheduler=dict(type='ParamSchedulerHook'), sampler_seed=dict(type='DistSamplerSeedHook'), timer=dict(type='IterTimerHook'), visualization=dict(type='SegVisualizationHook')) default_scope = 'mmseg' env_cfg = dict( cudnn_benchmark=True, dist_cfg=dict(backend='nccl'), mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0)) img_ratios = [ 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, ] launcher = 'none' load_from = None log_level = 'INFO' log_processor = dict(by_epoch=False) model = dict( auxiliary_head=dict( align_corners=False, channels=256, concat_input=False, dropout_ratio=0.1, in_channels=512, in_index=2, loss_decode=dict( loss_weight=0.4, type='CrossEntropyLoss', use_sigmoid=False), norm_cfg=dict(requires_grad=True, type='SyncBN'), num_classes=150, num_convs=1, type='FCNHead'), backbone=dict( act_cfg=dict(type='GELU'), attn_drop_rate=0.0, depths=( 2, 2, 27, 2, ), dims=128, downsample_version='v1', drop_path_rate=0.3, drop_rate=0.0, embed_dims=128, init_cfg=dict( checkpoint= 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/swin/swin_base_patch4_window7_224_20220317-e9b98025.pth', type='Pretrained'), mlp_ratio=0.0, norm_cfg=dict(requires_grad=True, type='LN'), num_heads=[ 4, 8, 16, 32, ], out_indices=( 0, 1, 2, 3, ), patch_norm=True, patch_size=4, patchembed_version='v1', pretrain_img_size=224, pretrained= '../../ckpts/classification/outs/vssm/vssmbasedp05/vssmbase_dp05_ckpt_epoch_260.pth', qk_scale=None, qkv_bias=True, ssm_d_state=16, ssm_dt_rank='auto', ssm_ratio=2.0, strides=( 4, 2, 2, 2, ), type='MM_VSSM', use_abs_pos_embed=False, window_size=7), data_preprocessor=dict( bgr_to_rgb=True, mean=[ 123.675, 116.28, 103.53, ], pad_val=0, seg_pad_val=255, size=( 512, 512, ), std=[ 58.395, 57.12, 57.375, ], type='SegDataPreProcessor'), decode_head=dict( align_corners=False, channels=512, dropout_ratio=0.1, in_channels=[ 128, 256, 512, 1024, ], in_index=[ 0, 1, 2, 3, ], loss_decode=dict( loss_weight=1.0, type='CrossEntropyLoss', use_sigmoid=False), norm_cfg=dict(requires_grad=True, type='SyncBN'), num_classes=150, pool_scales=( 1, 2, 3, 6, ), type='UPerHead'), pretrained=None, test_cfg=dict(mode='whole'), train_cfg=dict(), type='EncoderDecoder') norm_cfg = dict(requires_grad=True, type='SyncBN') optim_wrapper = dict( optimizer=dict( betas=( 0.9, 0.999, ), lr=6e-05, type='AdamW', weight_decay=0.01), paramwise_cfg=dict( custom_keys=dict( absolute_pos_embed=dict(decay_mult=0.0), norm=dict(decay_mult=0.0), relative_position_bias_table=dict(decay_mult=0.0))), type='OptimWrapper') optimizer = dict(lr=0.01, momentum=0.9, type='SGD', weight_decay=0.0005) param_scheduler = [ dict( begin=0, by_epoch=False, end=1500, start_factor=1e-06, type='LinearLR'), dict( begin=1500, by_epoch=False, end=160000, eta_min=0.0, power=1.0, type='PolyLR'), ] resume = False test_cfg = dict(type='TestLoop') test_dataloader = dict( batch_size=1, dataset=dict( data_prefix=dict( img_path='images/validation', seg_map_path='annotations/validation'), data_root= '/home/alexandra/Documents/VMamba/data/ade/ADEChallengeData2016', pipeline=[ dict(type='LoadImageFromFile'), dict(keep_ratio=True, scale=( 2048, 512, ), type='Resize'), dict(reduce_zero_label=True, type='LoadAnnotations'), dict(type='PackSegInputs'), ], type='ADE20KDataset'), num_workers=4, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) test_evaluator = dict( iou_metrics=[ 'mIoU', ], type='IoUMetric') test_pipeline = [ dict(type='LoadImageFromFile'), dict(keep_ratio=True, scale=( 2048, 512, ), type='Resize'), dict(reduce_zero_label=True, type='LoadAnnotations'), dict(type='PackSegInputs'), ] train_cfg = dict( max_iters=160000, type='IterBasedTrainLoop', val_interval=16000) train_dataloader = dict( batch_size=2, dataset=dict( data_prefix=dict( img_path='images/training', seg_map_path='annotations/training'), data_root= '/home/alexandra/Documents/VMamba/data/ade/ADEChallengeData2016', pipeline=[ dict(type='LoadImageFromFile'), dict(reduce_zero_label=True, type='LoadAnnotations'), dict( keep_ratio=True, ratio_range=( 0.5, 2.0, ), scale=( 2048, 512, ), type='RandomResize'), dict( cat_max_ratio=0.75, crop_size=( 512, 512, ), type='RandomCrop'), dict(prob=0.5, type='RandomFlip'), dict(type='PhotoMetricDistortion'), dict(type='PackSegInputs'), ], type='ADE20KDataset'), num_workers=4, persistent_workers=True, sampler=dict(shuffle=True, type='InfiniteSampler')) train_pipeline = [ dict(type='LoadImageFromFile'), dict(reduce_zero_label=True, type='LoadAnnotations'), dict( keep_ratio=True, ratio_range=( 0.5, 2.0, ), scale=( 2048, 512, ), type='RandomResize'), dict(cat_max_ratio=0.75, crop_size=( 512, 512, ), type='RandomCrop'), dict(prob=0.5, type='RandomFlip'), dict(type='PhotoMetricDistortion'), dict(type='PackSegInputs'), ] tta_model = dict(type='SegTTAModel') tta_pipeline = [ dict(backend_args=None, type='LoadImageFromFile'), dict( transforms=[ [ dict(keep_ratio=True, scale_factor=0.5, type='Resize'), dict(keep_ratio=True, scale_factor=0.75, type='Resize'), dict(keep_ratio=True, scale_factor=1.0, type='Resize'), dict(keep_ratio=True, scale_factor=1.25, type='Resize'), dict(keep_ratio=True, scale_factor=1.5, type='Resize'), dict(keep_ratio=True, scale_factor=1.75, type='Resize'), ], [ dict(direction='horizontal', prob=0.0, type='RandomFlip'), dict(direction='horizontal', prob=1.0, type='RandomFlip'), ], [ dict(type='LoadAnnotations'), ], [ dict(type='PackSegInputs'), ], ], type='TestTimeAug'), ] val_cfg = dict(type='ValLoop') val_dataloader = dict( batch_size=1, dataset=dict( data_prefix=dict( img_path='images/validation', seg_map_path='annotations/validation'), data_root= '/home/alexandra/Documents/VMamba/data/ade/ADEChallengeData2016', pipeline=[ dict(type='LoadImageFromFile'), dict(keep_ratio=True, scale=( 2048, 512, ), type='Resize'), dict(reduce_zero_label=True, type='LoadAnnotations'), dict(type='PackSegInputs'), ], type='ADE20KDataset'), num_workers=4, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) val_evaluator = dict( iou_metrics=[ 'mIoU', ], type='IoUMetric') vis_backends = [ dict(type='LocalVisBackend'), ] visualizer = dict( name='visualizer', type='SegLocalVisualizer', vis_backends=[ dict(type='LocalVisBackend'), ]) work_dir = './work_dirs/upernet_vssm_4xb4-160k_ade20k-512x512_base'
Do you have any idea where this huge difference might come from?
Thanks, Alexandra