inference with diff video size

zhw-zhang commented 9 months ago

Hi, thank you for your work. For some downstream tasks, I usually need to convert videos of various sizes into 512x320, but I see that the default input received by DOT is 856x480, num_tracks=8192; My question is: if the input video is 512x320 size, which values need to be changed(eg: How should I adjust the value of num_track according to the size?) , and whether the change will affect the final performance. In short, for 512x320 videos, is there a good set of parameters that can maintain the original performance?

16lemoing commented 9 months ago

Hi @zhw-zhang ! DOT is trained at resolution 512x512 but we can do inference at a different resolution, e.g., 856x480 for the demo videos. I suggest to keep num_tracks=8192 as performance does not improve much with more tracks and you lose speed. Less tracks may be enough depending on the use case. All you have to do is specify height=512 width=320. Let me know if you have further questions.

zhw-zhang commented 9 months ago

Thank you very much, that's all

16lemoing / dot

inference with diff video size #10