Closed lewandofskee closed 3 months ago
Hi! Thanks for your advice. Since the results of high-resolution pretrained ViT (like 1024 or higher) have not been published, we only compare the high-resolution evaluation on ViT and VRWKV(The results are in the appendix and have not been published yet). We find VRWKVs have better robustness than ViTs in the changing of input scales.
It seems that in the article there is only a comparison of efficiency with ViT at high resolution. How about the effectiveness at high resolution compared with ViT?