Closed afterthat97 closed 1 month ago
Hello, thanks for the questions!
We originally found 2D trajectories from CoTracker to be unreliable, especially for longer term tracking (say more than 30 frames). Because of this, we opted for our divide and conquer to build trajectories, which uses CoTracker as one part of its supervision, but does not rely on it as much as initializing with lifted 3D trajectories. However, concurrent works such as MoSca and Shape of Motion do this lifting. I haven't had the time to properly evaluate it, but my intuition is that lifting 2D trajectories to 3D is faster to optimize, but is also more likely to fail in the presence tracking and depth errors.
I intend to do a more thorough ablation because I've qualitatively found that the importance of the different losses varies greatly depending on the scene. On the Nvidia scenes in the ablation, removing the segmentation and tracking losses did not have much of an effect (which I found surprising) -- my best intuition is that these scenes exhibit fairly "easy" motion, so other losses can compensate when one is removed. However, I don't think this is typical, and I hope to have numbers to support this soon.
Thank you for your fast and informative reply, wish you a nice day!
Dear authors, thanks for sharing your nice work! I have some minor questions about the paper: