RADIOv2.1 still much better than v2.5 with image size resolution 700px ?

tcourat commented 2 months ago

Hi

It seems that RADIOv2.1 is still much better than v2.5 for images with a resolution smaller than 700px, according to your technical report. What caused this ? Is this only due to having a smaller architecture (L<H). In this case do you plan to release a H version too or do you advice to keep using v2.1 for "small" images ?

Thanks

mranzinger commented 2 months ago

I think it depends on your use case. v2.5-L is competitive with v2(-H) at the 432px resolution for the LLaVA 1.5 metrics. Same goes for semantic segmentation. v2 is definitely still better at summary tasks (e.g. classification), and that holds up until the mode switch.

What caused this ? Is this only due to having a smaller architecture (L<H)

Yes, that's exactly what is going on. ViT-L has about half the number of parameters as ViT-H. ViT-B is something like 1/7th.

In this case do you plan to release a H version too

Yes, we have an H model that's being trained.

or do you advice to keep using v2.1 for "small" images ?

For small images, I would recommend trying both if you can spare the compute.

mranzinger commented 4 weeks ago

Hello, sorry this took so long. We just released radio_v2.5-h, which is an improvement over v2.5-l, and a big improvement over 2.1-h. Just make sure you run torch.hub.load with the force_reload=True flag the first time you try to run with the new model.

Revist commented 3 weeks ago

Hi mranzinger,

I greatly like your work :)

Could you share how many gpus were used for approximately how much time to train biggest model?

NVlabs / RADIO

RADIOv2.1 still much better than v2.5 with image size resolution 700px ? #86