Traffic-X / ViT-CoMer

Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.
Apache License 2.0
231 stars 16 forks source link

What is the a variant version of BEiTv2 used in ViT-Adapter? #1

Closed HITerStudy closed 7 months ago

HITerStudy commented 8 months ago

As describled in the paper(Table 4. Comparisons with previous SOTA on COCO val 2017), the pretrain checkpoint from the a variant version of BEiTv2 used in ViT-Adapter can improve the performance, please give some details, thanks!

Jeremy-lf commented 8 months ago

As describled in the paper(Table 4. Comparisons with previous SOTA on COCO val 2017), the pretrain checkpoint from the a variant version of BEiTv2 used in ViT-Adapter can improve the performance, please give some details, thanks!

The original intention of ViT-CoMer is that it can directly utilize open-source ViT pre-training instead of retraining large-scale pre-training. Therefore, we combined BEiTv2 with other advanced pre-training and also used TTA during testing, both of which improved the performance of the model.