NVlabs / GroupViT

Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
https://arxiv.org/abs/2202.11094
Other
705 stars 53 forks source link

Unable to reproduce the results on PASCAL VOC. #28

Closed yash0307 closed 2 years ago

yash0307 commented 2 years ago

Dear Authors,

I tried training the GroupViT model on GCC + YFCC datasets using the group_vit_gcc_yfcc_30e.yml config file and with a batch size of 2048 (256x8). The results on PASCAL VOC after 30 epochs of training is roughly 5% mIoU lower (absolute) than what is reported in the paper. Additionally, I tried training with gradient accumulation with 2 steps, which did not give any improvement. Do you have any suggestions on what can cause the lower performance?

Thank you.

xvjiarui commented 2 years ago

Hi @yash0307

Thanks for your interest in our work.

Sometimes, the multi-label training is not stable enough. You may set multi_label=0 in the config or pass --opts data.text_aug.multi_label=0 when you launch training.
It should yield 1% lower mIoU than what we reported.

yash0307 commented 2 years ago

I will try that. Thank you for the quick response.