Closed zhw-zhang closed 9 months ago
Hi @zhw-zhang ! DOT is trained at resolution 512x512 but we can do inference at a different resolution, e.g., 856x480 for the demo videos. I suggest to keep num_tracks=8192 as performance does not improve much with more tracks and you lose speed. Less tracks may be enough depending on the use case. All you have to do is specify height=512 width=320. Let me know if you have further questions.
Thank you very much, that's all
Hi, thank you for your work. For some downstream tasks, I usually need to convert videos of various sizes into 512x320, but I see that the default input received by DOT is 856x480, num_tracks=8192; My question is: if the input video is 512x320 size, which values need to be changed(eg: How should I adjust the value of num_track according to the size?) , and whether the change will affect the final performance. In short, for 512x320 videos, is there a good set of parameters that can maintain the original performance?