Open 1093842024 opened 7 months ago
what'more, for model pretrain with frame_num=4, can I change frame_num to 6 or other numbers to load the model and do inference?
For ONNX, it's not created by me. You can ask the authors for help.
For more frames, you need to interpolate the temporal position embedding.
could you update onnx model export scripts for vision encoder and text encoder model to get embedding, thanks