antoine77340 / S3D_HowTo100M

S3D Text-Video model trained on HowTo100M using MIL-NCE
Apache License 2.0
191 stars 21 forks source link

Video configuration for best performance under limited computing resource #12

Closed r-cui closed 3 years ago

r-cui commented 3 years ago

Hi Antoine,

First of all, great work! The codes are extremely friendly to use, I'd like to thank your efforts.

I'm trying to use your model as the first step of my own project to extract good features for both video and language. It would be great if you could advise on some doubts I have.

If I understand correctly, the model best performs on video clips under "FPS = 10, 32 frames (3.2 sec)". Due to my limit of computing resource (basically GPU memory), I'd like to downscale this config. What rules should I stick to in this situation? Should I stick to the config of "clip being 3.2 sec", hence video clips be like "FPS = 2.5, 8 frames (3.2 sec)", or should I stick to "FPS being 10", hence use something like "FPS = 10, 8 frames (0.8 sec)"?

Second, to what extend do you recommend finetuning the params of your pretrained MIL-NCE? I think it would be safe to assume that finetuning will always help on downstream tasks, but I have little sense on how much it could help in our case. Maybe you could also advise on this?

Thank you in advance.

antoine77340 commented 3 years ago
  1. So I understand you want to restrict to 8 frames? If yes then I suggest you keep FPS to 10 instead of 2.5.
  2. I think it's wise to fine-tune this model for downstream tasks that have a large-domain gap or a decent amount of training examples to fine-tune on. If you fine-tune this model, you may want no to deviate too much for the pretrained weights by setting a relatively low learning rate as well as a reducing the number of training steps compared to the training steps you would use from a training from scratch.
antoine77340 commented 3 years ago

Note that for 1. I've never tried using 8 frames instead of 32 frames so not sure how the model will behave. I only tried going down to 16 frames which is quite ok.

r-cui commented 3 years ago

Thank you for your reply. For 1, what I'm asking is that if I cannot afford the setting "FPS = 10, 32 frames (3.2 sec)" and have to decrease either the FPS or the number of frames, which one should I go?

For example, suppose I want to half my total computation. Option 1: Use FPS = 5, 32 frames. Option 2: Use FPS = 10, 16 frames. Which one is preferable?

antoine77340 commented 3 years ago

Thanks for clarifying! I would go for option 1, which is putting more frames to the model.