YudeWang / SEAM

Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation, CVPR 2020 (Oral)
MIT License
539 stars 97 forks source link

segmentation training problems #20

Closed Eli-YiLi closed 3 years ago

Eli-YiLi commented 3 years ago
  1. It seems that you use the train_set to train segmentation model. why not use trainaug?
  2. Following the setting in #11, my results is 61.5 training with trainaug and 56.7 with train. Why it differs a lot from the results of the paper? (Note that the weight is from ilsvrc-cls_rna-a1_cls1000_ep-0001.params. test resolution is (1024512) [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] in test.)
  3. why it drops after applying crf in RW step?
YudeWang commented 3 years ago
  1. https://github.com/YudeWang/SEAM/blob/2a06992d6515424c62f8a6cc0ca0e2e42aab5822/train_SEAM.py#L41
  2. The setting in #11 is for retrain step, which is not included in this repository. Here are three steps: SEAM, RW, retrain. And I wonder why the test resolution is 1024x512. The images in PASCAL VOC are much smaller than it.
  3. crf not always improve performance, especially when the prediction is not good enough.
Eli-YiLi commented 3 years ago

thanks for you reply

  1. in retrain phase, which dataset do you use? train or trainaug 2.I tried some stronger segmentation algorithms like psp, deeplabv3 on mmsegmentation, with backbone res101 and wide_res38. But the highest mIoU is 62.5(psp res101). Their img_scale is (2048, 512)
Eli-YiLi commented 3 years ago

specifically, the config of data part is as follow:

img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) crop_size = (512, 512) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)), dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), dict(type='RandomFlip', prob=0.5), dict(type='PhotoMetricDistortion'), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_semantic_seg']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(2048, 512), img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], flip=True, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', img_norm_cfg), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ]

the mIoU on VOC12 is 79.95(supervised), it is supposed better than deeplabv1.

could you please give some suggestion to me, tks.

YudeWang commented 3 years ago

@Eli-YiLi

  1. I use trainaug set in retrain step
  2. 448x448 or 513x513 randomly cropped patch is enough for VOC images. For the reason that the pseudo label is not good enough, advanced models like PSPNet/DeepLabv3/v3+ will overfit on these low-quality pseudo labels, leading to a performance degeneration. I use Deeplabv1 to retrain and the setting is given in #11
Eli-YiLi commented 3 years ago

Thank you a lot.