OpenGVLab / unmasked_teacher

[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
https://arxiv.org/abs/2303.16058
MIT License
285 stars 15 forks source link

onnx model export for vision encoder and text encoder model to get embedding #34

Open 1093842024 opened 7 months ago

1093842024 commented 7 months ago

could you update onnx model export scripts for vision encoder and text encoder model to get embedding, thanks

1093842024 commented 7 months ago

what'more, for model pretrain with frame_num=4, can I change frame_num to 6 or other numbers to load the model and do inference?

Andy1621 commented 6 months ago

For ONNX, it's not created by me. You can ask the authors for help.

For more frames, you need to interpolate the temporal position embedding.