Hello, thanks for your sharing again. I have downloaded the kinetic_400_vitl_epoch_1600 pretrained weight and trying to do visualization on videos. The vis.sh I run is same as the official except for the path. However, timm raise the error "RuntimeError: Unknown model (pretrain_videomae_base_patch16_224)" while loading the model.
I have tried to modify pretrain_videomae_base_patch16_224 to vit_base_patch16_224, then another error raised "TypeError: init() got an unexpected keyword argument 'decoder_depth'". Subsequently, I comment the keyword "deocder_path" in line80 of run_videomae_viz.py but the error occurs again "AttributeError: 'VisionTransformer' object has no attribute 'encoder'".
Hello, thanks for your sharing again. I have downloaded the kinetic_400_vitl_epoch_1600 pretrained weight and trying to do visualization on videos. The vis.sh I run is same as the official except for the path. However, timm raise the error "RuntimeError: Unknown model (pretrain_videomae_base_patch16_224)" while loading the model.
I have tried to modify pretrain_videomae_base_patch16_224 to vit_base_patch16_224, then another error raised "TypeError: init() got an unexpected keyword argument 'decoder_depth'". Subsequently, I comment the keyword "deocder_path" in line80 of run_videomae_viz.py but the error occurs again "AttributeError: 'VisionTransformer' object has no attribute 'encoder'".
I really appreciate if you can help it. Thanks.