BIT-DA / SePiCo

[TPAMI 2023 ESI Highly Cited Paper] SePiCo: Semantic-Guided Pixel Contrast for Domain Adaptive Semantic Segmentation https://arxiv.org/abs/2204.08808
https://arxiv.org/abs/2204.08808
Other
112 stars 7 forks source link

Why 640x640 for training and 1280x640 for testing? #1

Closed Haochen-Wang409 closed 2 years ago

Haochen-Wang409 commented 2 years ago

Hi, thanks for your great work of using contrastive learning to bridge the domain shift between source and target domains!

However, I am a littble bit confused of the scale of input images when training and testing. In DAFormer, the input scale of training image is 512x512, and 1024x512 for testing, but in your setting, the scales are 640x640 and 1280x640 respectively. Did you train the model with 512x512 and evaluate with 1024x512? As ablated in SegFormer, training with 640x640 will be about 0.5 mIoU better than 512x512.

By the way, which one is the common practice in UDA? 640x640 or 512x512?

Looking forward to your reply.

BinhuiXie commented 2 years ago

Hi @Haochen-Wang409

Thanks for your attention to our work.

Actually, the most common practice in UDA is the setting where 1280x640 for training and 2048x1024 (i.e., original size for Cityscapes Val set) for testing. However, due to GPU limitations, we follow DAFormer and adopt a smaller input size for training and testing. In practice, training with 640x640 improves about 0.2 mIoU than 512x512 in the context of domain adaptation.

Haochen-Wang409 commented 2 years ago

Thanks for your timely reply!

So, the common practice for UDA using DeepLabv2 is 1280x640 while 512x512 for SegFormer, am I right?

BinhuiXie commented 2 years ago

If the GPU allows, you might get better results with a larger size.

Haochen-Wang409 commented 2 years ago

hah that's true, but it is obviously not fair if I use larger inputs to get better results.

Anyway, thanks for your reply. I appreciate it a lot!