OpenGVLab / Vision-RWKV

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
https://arxiv.org/abs/2403.02308
Apache License 2.0
371 stars 14 forks source link

The effectiveness at high resolution? #16

Closed lewandofskee closed 3 months ago

lewandofskee commented 6 months ago

It seems that in the article there is only a comparison of efficiency with ViT at high resolution. How about the effectiveness at high resolution compared with ViT?

duanduanduanyuchen commented 6 months ago

Hi! Thanks for your advice. Since the results of high-resolution pretrained ViT (like 1024 or higher) have not been published, we only compare the high-resolution evaluation on ViT and VRWKV(The results are in the appendix and have not been published yet). We find VRWKVs have better robustness than ViTs in the changing of input scales.