Hi, I have noticed that the "adaptive kv stride" in configuration for pretrained ImageNet weight is "4, 4". But according to the paper, in the version of Video MViTv2, the "adaptive kv stride" is "1,8,8". Therefore it cannot be directly used for video training initialization. Would you mind sharing the weights used for initialize MViT for video training.
Hi, I have noticed that the "adaptive kv stride" in configuration for pretrained ImageNet weight is "4, 4". But according to the paper, in the version of Video MViTv2, the "adaptive kv stride" is "1,8,8". Therefore it cannot be directly used for video training initialization. Would you mind sharing the weights used for initialize MViT for video training.