question about FlexiViT

Hi, thanks for your interest!

The implementation of PI-resize during training is here: https://github.com/google-research/big_vision/blob/main/big_vision/models/proj/flexi/vit.py#L30-L75

In words: PI-resize does not introduce any new trainable parameters. You define some learnable parameter for the patch-embedding just like in regular ViT: pick any patch-size, doesn't really matter what, we use 32x32, so allocate a 32x32x3x[model-dim] buffer. Then, before passing that to the conv operation for patch-embedding, multiply it with the PI-resize matrix. That matrix can be computed analytically once at the start and is not trained, see code pointer above.

I'm not sure what loss you mean - there is no need to change whatever loss you are using when "flexifying" your training loop.

google-research / big_vision

question about FlexiViT #30