YihongSun / Dynamo-Depth

[NeurIPS 2023] Dynamo-Depth: Fixing Unsupervised Depth Estimation for Dynamical Scenes
https://dynamo-depth.github.io
MIT License
69 stars 5 forks source link

Can you provide weights for the first few epochs of training on Lite-Mono? #3

Closed Wshenv closed 9 months ago

Wshenv commented 9 months ago

I trained 30 epochs on nuScenes using the Lite-Mono model alone (I used base instead of Lite-Mono-8M and used Automask) and did achieve similar results to the paper at the end of the training period, but it produced better results at epoc 0.

0 15-20

YihongSun commented 9 months ago

Hello,

I could not find the first few epochs of training on Lite-Mono (sorry!), but I find the attached results above to be reasonable. Here are two reasons that I think may have contributed to this behavior.

  1. As mentioned in Section 4.3 and discussed in Appendix B.5 of our manuscript, we observe that LiteMono would predict erroneous depths for independent moving objects in later training iterations since their motions are harder to learn from a single frame (Figure 6). This would contribute to a better depth performance at earlier training iterations.
  2. As mentioned in Section 5.1 and discussed in Appendix D.2 of our manuscript, a subsection of nuScenes test instances contain low-light conditions where the brightness consistency assumption no longer holds. For methods that have a rigid scene constraint (e.g., LiteMono), the associated artifacts would cause a large error (Figure 9). Since early-stopping (e.g., eval @ 1 epoch) would limit the model from learning these artifacts, the depth performance is better at earlier iterations. To eliminate this erroneous outlier when evaluating, feel free to use the nuscenes_dayclear split instead. (We report this in Table 6 and 7)

I hope these two points can offer some intuition behind the observed behavior and I'm happy to discuss further!


Regarding the reported values in Table 1, since all of the methods are unsupervised, we don't consider the case of a labeled validation set for early-stopping and therefore train all methods to convergence.

I hope this helps and feel free to reply if there's any other questions or concerns!

Wshenv commented 9 months ago

Thank you very much for your answer First of all, thank you very much again for your timely reply to me. I also feel sorry for my slow reply. I very much agree with your above two points, I learned a lot from it! . I have another question that I hope you can help me with. Have you ever thought about the learning rate? From Lite-Mono to Dynamo-Depth, the two variables are your contribution point and your learning rate. I would like to know: did you only adjust the learning rate of Lite-Mono to be the same as Dynamo-Depth for training on nuScenes, and what is the result? I think this can better verify your contribution points

YihongSun commented 9 months ago

Hello,

No worries at all and I'm glad you find the mentioned points helpful!

Since our interest from the start was to correctly predict depth for dynamical objects, we did minimal learning rate tuning (swept in factors of 10 with monodepth2 on Waymo Open). Once the learning rate was found, we simply kept it when switching to LiteMono and when applying to nuScenes and KITTI.

For your second question, the performance is similar when training LiteMono with our learning rate without the listed contributions.

To verify, simply train with tag --epoch_schedules 30 0 0 0 instead of the default --epoch_schedules 1 1 5 20. This would only train the depth + pose network for 30 epochs, which would ablate all of our contributions.

Hope this works!

Wshenv commented 9 months ago

Thank you very much for your patient and careful answer. I have learned a lot from it. I would like to extend my heartfelt congratulations and best wishes to you and wish you continue to achieve excellent results in your studies. Keep up the great work!