Closed r-cui closed 3 years ago
Note that for 1. I've never tried using 8 frames instead of 32 frames so not sure how the model will behave. I only tried going down to 16 frames which is quite ok.
Thank you for your reply. For 1, what I'm asking is that if I cannot afford the setting "FPS = 10, 32 frames (3.2 sec)" and have to decrease either the FPS or the number of frames, which one should I go?
For example, suppose I want to half my total computation. Option 1: Use FPS = 5, 32 frames. Option 2: Use FPS = 10, 16 frames. Which one is preferable?
Thanks for clarifying! I would go for option 1, which is putting more frames to the model.
Hi Antoine,
First of all, great work! The codes are extremely friendly to use, I'd like to thank your efforts.
I'm trying to use your model as the first step of my own project to extract good features for both video and language. It would be great if you could advise on some doubts I have.
If I understand correctly, the model best performs on video clips under "FPS = 10, 32 frames (3.2 sec)". Due to my limit of computing resource (basically GPU memory), I'd like to downscale this config. What rules should I stick to in this situation? Should I stick to the config of "clip being 3.2 sec", hence video clips be like "FPS = 2.5, 8 frames (3.2 sec)", or should I stick to "FPS being 10", hence use something like "FPS = 10, 8 frames (0.8 sec)"?
Second, to what extend do you recommend finetuning the params of your pretrained MIL-NCE? I think it would be safe to assume that finetuning will always help on downstream tasks, but I have little sense on how much it could help in our case. Maybe you could also advise on this?
Thank you in advance.