boostcampaitech3 / level2-semantic-segmentation-level2-cv-09

level2-semantic-segmentation-level2-cv-09 created by GitHub Classroom
2 stars 4 forks source link

[Model] MMSegmentation 구성 및 모델 테스트 #46

Open yoonghee opened 2 years ago

yoonghee commented 2 years ago

What

Why?

Todo

km9mn commented 2 years ago

여기에 mmseg 지금 돌리고 있거나 돌릴 계획인 모델 댓글로 달면 어떨까요 저는 deeplabv3plus_r50-d8_512x512_40k_voc12aug.py 수정해서 돌리고 있어요

yoonghee commented 2 years ago

오오~ 한페이지에서 확인할 수 있어서 좋네요! 저는 swin_large_patch4_window12_384_22k pretrained model 가져와서 돌려보고 있습니다

jeongjae96 commented 2 years ago

무수한 에러 끝에 BEiT은 포기하고 SwinB-UperNet 모델 돌리는 중입니다..

jeongjae96 commented 2 years ago

저희 모델 돌릴 때 seed 값도 정해서 하는 것 어떨까요? 인자로 줄 수 있는 것 같습니다.

jeongjae96 commented 2 years ago

@updaun #54 BEiT

km9mn commented 2 years ago

deeplabv3plus r101

yoonghee commented 2 years ago

저희 모델 돌릴 때 seed 값도 정해서 하는 것 어떨까요? 인자로 줄 수 있는 것 같습니다.

저희 정한건 아니지만 정재님꺼 보고 seed=42로 돌리는 중입니다ㅎㅎ

jeongjae96 commented 2 years ago

저희 모델 돌릴 때 seed 값도 정해서 하는 것 어떨까요? 인자로 줄 수 있는 것 같습니다.

저희 정한건 아니지만 정재님꺼 보고 seed=42로 돌리는 중입니다ㅎㅎ

오 42로 가나요 ㅋㅋㅋㅋ

yoonghee commented 2 years ago

위 - SwinL fold2 multi scale 학습 및 추론 아래 - SwinL fold2 학습입니다 image

jeongjae96 commented 2 years ago

public test셋이 fold1과 잘 맞는 것 같네요! swinL epoch 더 늘려서 해보면 좋을 것 같습니다.

_SwinL multi(train ratiorange=(0.5, 2.0)) fold1 seed 42 epoch80 image

best val mIoU인 epoch 79 에서는 0.7864 나왔습니다.

yoonghee commented 2 years ago

[model] mmsegmentation BEiT upernet setup https://github.com/boostcampaitech3/level2-semantic-segmentation-level2-cv-09/issues/46 학습 결과 입니다. 위에 BEiT latest(80) fold2 아래 BEiT best(39) mIoU fold2 점수입니다.

image

jeongjae96 commented 2 years ago

SwinL 모델은 100 epoch 돌리고 best miou 내는게 좋을듯합니다.

jeongjae96 commented 2 years ago

이제 슬슬 프로젝트 마무리가 다가와서 서로 실험하는 내용 자세하게 공유하면 좋을 것 같습니다.

Model: SwinL UperNet Data: fold1_revised_ver1

dataset.py train_pipeline: ratio_range(0.5, 2.0)

epoch.py max_epochs=100, checkpoint_config interval=5, max_keep_ckpts=5

저는 이렇게 실험중입니다.

결과 image

km9mn commented 2 years ago

Model: SwinL UperNet Data: fold1_revised_ver3 + pseudo

dataset.py train_pipeline: ratio_range(0.5, 2.0)

epoch.py max_epochs=100, checkpoint_config interval=1, max_keep_ckpts=3

AdamW -> Adam 실험중입니다

tjrudrnr2 commented 2 years ago

Model: SwinL UperNet Data: fold1

dataset.py train_pipeline: (256,768), 128 => multi_scale

epoch.py max_epochs=100, checkpoint_config interval=5, max_keep_ckpts=5

tjrudrnr2 commented 2 years ago

swinL train+pseudo fold1 0.7046 => 배터리 val이 낮았는데 이거 때문인가? => 기존 train kfold에서 fold1이 압도적으로 높았는데 그럼 다른 fold에서는 낮나?

yoonghee commented 2 years ago

BEiT train_revisedV1 + pseudo + multi_scale 아래부터 epoch 22 / 46 / 71 결과입니다.

image

yoonghee commented 2 years ago

경국님이 드라이브에 올려주신 revisedV3_fold4로 BEiT 학습시켜 보겠습니다. single image scale (512, 512) img_ratio (0.5, 2.0)

jeongjae96 commented 2 years ago

Model: SwinB UperNet Data: fold4 + pseudo

config.py lr_config: by_epoch=False

dataset.py train_pipeline: ratio_range(0.5, 2.0) test_pipeline: img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75]

epoch.py max_epochs=100, checkpoint_config interval=5, max_keep_ckpts=5

tjrudrnr2 commented 2 years ago

앙상블 다양성을 위해 hrnetV2_ocr_w64 pseudo labeleling kfold ensemble + baseline에서 multi scale 어떻게 적용할 지 찾는 중 ...

0.6268 => 0.7965

jeongjae96 commented 2 years ago

Model: SwinB UperNet Data: fold2 + pseudo

config.py lr_config: by_epoch=False

dataset.py train_pipeline: ratio_range(0.5, 2.0) test_pipeline: img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75]

epoch.py max_epochs=100, checkpoint_config interval=5, max_keep_ckpts=5

yoonghee commented 2 years ago

경국님이 드라이브에 올려주신 revisedV3_fold4로 BEiT 학습시켜 보겠습니다. single image scale (512, 512) img_ratio (0.5, 2.0)

epoch 많이 돌려봤는데 epoch이 늘어날수록 score가 낮아지는 현상이 있네요 위쪽부터 142/135/65/57/37입니다

image

km9mn commented 2 years ago

Model: SwinL UperNet Data: fold1(revisedv3) + pseudo

config.py

dataset.py train_pipeline: ratio_range(0.5, 2.0)

epoch.py max_epochs=100, checkpoint_config interval=1, max_keep_ckpts=3

DiceCrossEntropyLoss -> DiceFocalLoss 실험

yoonghee commented 2 years ago

경국님이 드라이브에 올려주신 revisedV3_fold4로 BEiT 학습시켜 보겠습니다. single image scale (512, 512) img_ratio (0.5, 2.0)

fold3로도 돌려보겠습니다

jeongjae96 commented 2 years ago

Model: SwinL UperNet Data: new_train_all_anno_excluded_revised1 + pseudo(0.8189)

dataset.py train_pipeline: size_min = 256, size_max = 896 multi_scale = [(x,x) for x in range(size_min, size_max+1, 128)] dict(type='Resize', img_scale=multi_scale, multiscale_mode='value', keep_ratio=True)

test_pipeline: img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75]

epoch.py max_epochs=100, checkpoint_config interval=5, max_keep_ckpts=5

--no-validate

결과

yoonghee commented 2 years ago

경국님이 드라이브에 올려주신 revisedV3_fold4로 BEiT 학습시켜 보겠습니다. single image scale (512, 512) img_ratio (0.5, 2.0)

fold3로도 돌려보겠습니다

위쪽부터 40/33/29/7 epoch 입니다 140 에폭은 너무 많은것 같아 40 epoch까지 학습을 줄여보았습니다. BEiT는 15~30 epoch 사이 결과가 제일 좋은것으로 추정됩니다.

image
yoonghee commented 2 years ago

Model: BEiT UperNet Data: revisedV3 + pseudo (fold1)

dataset.py train_pipeline: size_min = 512, size_max = 1024 multi_scale = [(x,x) for x in range(size_min, size_max+1, 32)] dict(type='Resize', img_scale=multi_scale, multiscale_mode='value', keep_ratio=True)

epoch.py max_epochs=200, checkpoint_config interval=1, max_keep_ckpts=3

multi-scale fold1 실험

jeongjae96 commented 2 years ago

Model: SwinL UperNet Data: new_train_all_anno_excluded_revised1 + pseudo(0.8189)

dataset.py train_pipeline: size_min = 256, size_max = 896 multi_scale = [(x,x) for x in range(size_min, size_max+1, 128)] dict(type='Resize', img_scale=multi_scale, multiscale_mode='value', keep_ratio=True)

test_pipeline: img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75]

epoch.py max_epochs=100, checkpoint_config interval=5, max_keep_ckpts=5

--no-validate

결과

  • 0.8196 (epoch 30)
  • 0.8207 (epoch 44)
Traceback (most recent call last):
  File "/opt/ml/input/code/model/mmseg/Swin_UperNet/tools/train.py", line 180, in <module>
    main()
  File "/opt/ml/input/code/model/mmseg/Swin_UperNet/tools/train.py", line 169, in main
    train_segmentor(
  File "/opt/ml/input/test/mmsegmentation/mmseg/apis/train.py", line 191, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/opt/conda/envs/mmseg/lib/python3.10/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/opt/conda/envs/mmseg/lib/python3.10/site-packages/mmcv/runner/epoch_based_runner.py", line 49, in train
    self.call_hook('before_train_iter')
  File "/opt/conda/envs/mmseg/lib/python3.10/site-packages/mmcv/runner/base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "/opt/conda/envs/mmseg/lib/python3.10/site-packages/mmcv/runner/hooks/lr_updater.py", line 140, in before_train_iter
    self.regular_lr = self.get_regular_lr(runner)
  File "/opt/conda/envs/mmseg/lib/python3.10/site-packages/mmcv/runner/hooks/lr_updater.py", line 83, in get_regular_lr
    return [self.get_lr(runner, _base_lr) for _base_lr in self.base_lr]
  File "/opt/conda/envs/mmseg/lib/python3.10/site-packages/mmcv/runner/hooks/lr_updater.py", line 83, in <listcomp>
    return [self.get_lr(runner, _base_lr) for _base_lr in self.base_lr]
  File "/opt/conda/envs/mmseg/lib/python3.10/site-packages/mmcv/runner/hooks/lr_updater.py", line 387, in get_lr
    idx = get_position_from_periods(progress, self.cumulative_periods)
  File "/opt/conda/envs/mmseg/lib/python3.10/site-packages/mmcv/runner/hooks/lr_updater.py", line 415, in get_position_from_periods
    raise ValueError(f'Current iteration {iteration} exceeds '
ValueError: Current iteration 80000 exceeds cumulative_periods [30000, 80000]

에러 발생...

jeongjae96 commented 2 years ago

Model: SwinL UperNet Data: new_train_all_anno_excluded_revised1 + pseudo(0.8189)

dataset.py train_pipeline: size_min = 256, size_max = 896 multi_scale = [(x,x) for x in range(size_min, size_max+1, 128)] dict(type='Resize', img_scale=multi_scale, multiscale_mode='value', keep_ratio=True)

test_pipeline: img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75]

epoch.py max_epochs=60, checkpoint_config interval=2, max_keep_ckpts=8

--no-validate

--load-from과 --resume-from 사용해서 26 epoch 부터 재개

jeongjae96 commented 2 years ago

Model: SwinB UperNet Data: new_train_all_anno_excluded_revised1 + pseudo(0.8189)

dataset.py train_pipeline: size_min = 256, size_max = 896 multi_scale = [(x,x) for x in range(size_min, size_max+1, 128)] dict(type='Resize', img_scale=multi_scale, multiscale_mode='value', keep_ratio=True)

test_pipeline: img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75]

config.py

lr_config = dict(
    _delete_=True,
    policy="poly",
    warmup="linear",
    warmup_iters=1500,
    warmup_ratio=1e-6,
    power=1.0,
    min_lr=0.0,
    by_epoch=False,
)

epoch.py

runner = dict(type="EpochBasedRunner", max_epochs=60)
checkpoint_config = dict(max_keep_ckpts=6, by_epoch=True, interval=3)

--no-validate

yoonghee commented 2 years ago

Model: BEiT UperNet Data: revisedV3 + pseudo (fold1)

dataset.py train_pipeline: size_min = 512, size_max = 1024 multi_scale = [(x,x) for x in range(size_min, size_max+1, 32)] dict(type='Resize', img_scale=multi_scale, multiscale_mode='value', keep_ratio=True)

epoch.py max_epochs=200, checkpoint_config interval=1, max_keep_ckpts=3

multi-scale fold1 실험

위쪽부터 109/84/50/33/17 epoch 결과입니다

image
yoonghee commented 2 years ago

Model: BEiTUperNet Data: new_train_all_anno_excluded_revised1 + pseudo(0.8200)

dataset.py train_pipeline: size_min = 256, size_max = 896 multi_scale = [(x,x) for x in range(size_min, size_max+1, 128)] dict(type='Resize', img_scale=multi_scale, multiscale_mode='value', keep_ratio=True)

epoch.py

runner = dict(type="EpochBasedRunner", max_epochs=105)
checkpoint_config = dict(max_keep_ckpts=7, by_epoch=True, interval=10)

--no-validate

yoonghee commented 2 years ago

Model: BEiTUperNet Data: new_train_all_anno_excluded_revised1 + pseudo(0.8200)

dataset.py train_pipeline: size_min = 256, size_max = 896 multi_scale = [(x,x) for x in range(size_min, size_max+1, 128)] dict(type='Resize', img_scale=multi_scale, multiscale_mode='value', keep_ratio=True)

epoch.py

runner = dict(type="EpochBasedRunner", max_epochs=105)
checkpoint_config = dict(max_keep_ckpts=7, by_epoch=True, interval=10)

--no-validate

위쪽부터 80/60/40/20 epoch 결과입니다

image