SwinTransformer / Swin-Transformer-Semantic-Segmentation

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.
https://arxiv.org/abs/2103.14030
Apache License 2.0
1.19k stars 222 forks source link

What is the differences between mIoU and mIoU (ms+flip) ? #21

Closed BIGBALLON closed 3 years ago

BIGBALLON commented 3 years ago

Hi, everyone:

Backbone Method Crop Size Lr Schd mIoU mIoU (ms+flip) #params FLOPs config log model
Swin-T UPerNet 512x512 160K 44.51 45.81 60M 945G config github/baidu github/baidu
Swin-S UperNet 512x512 160K 47.64 49.47 81M 1038G config github/baidu github/baidu
Swin-B UperNet 512x512 160K 48.13 49.72 121M 1188G config github/baidu github/baidu

I want to know the differences between mIoU and mIoU (ms+flip), and what is the exact setting of mIoU (ms+flip)

Thanks a lot!

BIGBALLON commented 3 years ago

I got the point:

https://github.com/SwinTransformer/Swin-Transformer-Semantic-Segmentation/blob/bcfec6bb0cd82d2ac9c5a7b13f2c7d22160ac9f2/configs/_base_/datasets/ade20k.py#L19-L26

Turn on flip and use img_ratios for multi-scale.

Forget me ...... Just use

tools/dist_test.sh <CONFIG_FILE> <SEG_CHECKPOINT_FILE> <GPU_NUM> --aug-test --eval mIoU