What is the a variant version of BEiTv2 used in ViT-Adapter?

Traffic-X / ViT-CoMer

Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.

Apache License 2.0

231 stars 16 forks source link

As describled in the paper(Table 4. Comparisons with previous SOTA on COCO val 2017), the pretrain checkpoint from the a variant version of BEiTv2 used in ViT-Adapter can improve the performance, please give some details, thanks!

The original intention of ViT-CoMer is that it can directly utilize open-source ViT pre-training instead of retraining large-scale pre-training. Therefore, we combined BEiTv2 with other advanced pre-training and also used TTA during testing, both of which improved the performance of the model.

Traffic-X / ViT-CoMer

What is the a variant version of BEiTv2 used in ViT-Adapter? #1