baaivision / EVA

EVA Series: Visual Representation Fantasies from BAAI
MIT License
2.24k stars 165 forks source link

Size 14x14 to 16x16 patch interpolation for smaller EVA 2 models #153

Open tobiasvanderwerff opened 5 months ago

tobiasvanderwerff commented 5 months ago

Hi,

First of all, thank you for the great work you've published. I am trying to train EVA 2 on a custom object detection dataset and noticed that the *_p14to16 pre-trained models are only available for EVA-B and EVA_L (in this table), but not for the other model sizes. I am trying to use the smaller EVA S and/or Ti models instead. As far as I understand, the conversion from p14 to p16 involves a linear interpolation of the pos_embed parameters, as mentioned here. This would mean that it could also be applied as a post-processing step of the checkpoint file for the smaller models.

I have tried to do the interpolation myself, by using the interpolate_patch_14to16.py script. However, this does not seem to work for the EVA 2 checkpoints, because of an error in accessing key values in the checkpoint:

Traceback (most recent call last):
  File "/home/tobias/EVA/EVA-01/eva/interpolate_patch_14to16.py", line 53, in <module>
    patch_embed = checkpoint["model"]['patch_embed.proj.weight']
KeyError: 'model'

I am not quite sure if applying the script would be the right approach to take or if another approach is necessary. Could you provide any feedback on this? Thanks in advance!

tobiasvanderwerff commented 5 months ago

I think I found a decent solution. The interpolate_patch_14to16.py script can be modified in the following way:

matteot11 commented 3 months ago

Hi @tobiasvanderwerff, I think the same holds for patch_embed:

patch_embed = torch.nn.functional.interpolate(patch_embed.float(), size=(16, 16), mode='bicubic', align_corners=False)

While there is already the .float(), making the interpolate correctly work, the .half()to convert back to float16 is missing. Btw, thanks for the hint!