iters parameter in RAFT DPT decoder

YvanYin / Metric3D

The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."

BSD 2-Clause "Simplified" License

1.34k stars 100 forks source link

This is something strange but interesting. Currently I have no idea yet. We do observe that the first two iterations are the most important and after that, the refinements become incremental. But this can not illustrate your case.

As all outputs have been supervised, the decoder parameters should not be so sensitive to iteration changes.

Did you have surface normal labels in your datasets? If not, maybe the depth-normal consistency loss should be maintained. If you want to remove the normal branch, I may suggest that in the initial stages, the En-decoder should be frozen first .Only the regression layers shall be active. After some steps, you can unfreeze the other parts when the network becomes stable. Another possibly is that simpler losses (like L1 only) could alleviate the problem?

YvanYin / Metric3D

iters parameter in RAFT DPT decoder #124