Size 14x14 to 16x16 patch interpolation for smaller EVA 2 models

tobiasvanderwerff commented 5 months ago

Hi,

First of all, thank you for the great work you've published. I am trying to train EVA 2 on a custom object detection dataset and noticed that the *_p14to16 pre-trained models are only available for EVA-B and EVA_L (in this table), but not for the other model sizes. I am trying to use the smaller EVA S and/or Ti models instead. As far as I understand, the conversion from p14 to p16 involves a linear interpolation of the pos_embed parameters, as mentioned here. This would mean that it could also be applied as a post-processing step of the checkpoint file for the smaller models.

I have tried to do the interpolation myself, by using the interpolate_patch_14to16.py script. However, this does not seem to work for the EVA 2 checkpoints, because of an error in accessing key values in the checkpoint:

Traceback (most recent call last):
  File "/home/tobias/EVA/EVA-01/eva/interpolate_patch_14to16.py", line 53, in <module>
    patch_embed = checkpoint["model"]['patch_embed.proj.weight']
KeyError: 'model'

I am not quite sure if applying the script would be the right approach to take or if another approach is necessary. Could you provide any feedback on this? Thanks in advance!

tobiasvanderwerff commented 5 months ago

I think I found a decent solution. The interpolate_patch_14to16.py script can be modified in the following way:

The p14 checkpoints contain the weights under the module key, not model. I.e. use checkpoint['module'] instead of checkpoint['model'].

Bicubic interpolation does not work for half float16 precision. As far as I can see, this can be solved by converting to float32 as an intermediary step. I.e.:

    pos_tokens = pos_tokens.float()  # convert to float32 because float16 is not supported for bicubic interpolation
    pos_tokens = torch.nn.functional.interpolate(
        pos_tokens, size=(new_size, new_size), mode='bicubic', align_corners=False)
    pos_tokens = pos_tokens.half()  # convert back to float16

matteot11 commented 3 months ago

Hi @tobiasvanderwerff, I think the same holds for patch_embed:

patch_embed = torch.nn.functional.interpolate(patch_embed.float(), size=(16, 16), mode='bicubic', align_corners=False)

While there is already the .float(), making the interpolate correctly work, the .half()to convert back to float16 is missing. Btw, thanks for the hint!

baaivision / EVA

Size 14x14 to 16x16 patch interpolation for smaller EVA 2 models #153