Open tcourat opened 2 months ago
I think it depends on your use case. v2.5-L is competitive with v2(-H) at the 432px resolution for the LLaVA 1.5 metrics. Same goes for semantic segmentation. v2 is definitely still better at summary tasks (e.g. classification), and that holds up until the mode switch.
What caused this ? Is this only due to having a smaller architecture (L<H)
Yes, that's exactly what is going on. ViT-L has about half the number of parameters as ViT-H. ViT-B is something like 1/7th.
In this case do you plan to release a H version too
Yes, we have an H model that's being trained.
or do you advice to keep using v2.1 for "small" images ?
For small images, I would recommend trying both if you can spare the compute.
Hello, sorry this took so long. We just released radio_v2.5-h
, which is an improvement over v2.5-l, and a big improvement over 2.1-h. Just make sure you run torch.hub.load
with the force_reload=True
flag the first time you try to run with the new model.
Hi mranzinger,
I greatly like your work :)
Could you share how many gpus were used for approximately how much time to train biggest model?
Hi
It seems that RADIOv2.1 is still much better than v2.5 for images with a resolution smaller than 700px, according to your technical report. What caused this ? Is this only due to having a smaller architecture (L<H). In this case do you plan to release a H version too or do you advice to keep using v2.1 for "small" images ?
Thanks