Open yeezhu opened 5 months ago
Hello, thanks for asking! RADIO ViT-H/16 already outperforms bigger foundation models on most metrics we tested it on. Is there one downstream use case in particular you feel would benefit from a more expensive architecture?
I replaced EVA-CLIP-ViT-G with radio (ViT-H) in my VLM, the performance drops on general VQA tasks. So, I was wondering if there is a ViT-G version for radio.
Would you be able to provide a few more details on your setup? We haven't observed a reduction in metrics versus that model.
Are there any plans to train or release larger-scale models, such as those based on the ViT-G architecture?