YvanYin / Metric3D

The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
https://jugghm.github.io/Metric3Dv2/
Creative Commons Zero v1.0 Universal
986 stars 70 forks source link

Which dataset needs to apply scale-shift alignment? #113

Open zParquet opened 3 weeks ago

zParquet commented 3 weeks ago

Thanks for your excellent work! I have a problem that which dataset needs to apply scale-shift alignment during testing? I noticed that TABLE 1,2,3 (in Metric3D-v2 paper) are evaluated without scale-shift alignment with GT depth, while TABLE 4 is evaluated with scale-shift alignment. In my understanding, Metric3D resolves the metric depth estimation issue by contructing a canonical camera space, therefore the estimated depth needs no more scale-shift alignment across different datasets, as long as the correct focal length is given. Why do you conduct additional scale-shift alignment in the evaluation of TABLE 4?

JUGGHM commented 3 weeks ago

Thanks for your excellent work! I have a problem that which dataset needs to apply scale-shift alignment during testing? I noticed that TABLE 1,2,3 (in Metric3D-v2 paper) are evaluated without scale-shift alignment with GT depth, while TABLE 4 is evaluated with scale-shift alignment. In my understanding, Metric3D resolves the metric depth estimation issue by contructing a canonical camera space, therefore the estimated depth needs no more scale-shift alignment across different datasets, as long as the correct focal length is given. Why do you conduct additional scale-shift alignment in the evaluation of TABLE 4?

We applied scale (actually no shift) alignment in Table 4, when comparing relative depth estimation with those predicting depth without metric scale (like MiDAS, marigold, and original DepthAnything). When compared with metric depth estimators like zoedepth, there is no need to apply scale alignment.

zParquet commented 2 weeks ago

I still can't understand it. The paper claims that through canonical camera transformation, the output predicted depth (after detransformed) is expected to be particularly the metric depth. There is no need to apply scale-shift alignment, isn't it? But why do you apply additional scale-shift alignment in TABLE 4?

我的理解是Metric3D-v2对于任意数据,只要给定真实的pixel-represented focal length,就可以预测出metric depth,是这样吗?如果是的话,为什么在TABLE 4的比较中,对metric3d的结果还需要做scale-shift alignment呢?(尤其是仅在TABLE 4中出现的ScanNet数据集,我发现如果不做alignment,ScanNet预测的depth与GT偏差非常大) 我注意到TABLE 3和TABLE4中都有ETH3D的结果,而且TABLE 4的结果比TABLE 3中显著好,是否是因为对同一批测试数据,TABLE 3没做alignment,TABLE 4的结果做了alignment导致的? 以上疑惑想跟您请教,谢谢您!

JUGGHM commented 1 week ago

I still can't understand it. The paper claims that through canonical camera transformation, the output predicted depth (after detransformed) is expected to be particularly the metric depth. There is no need to apply scale-shift alignment, isn't it? But why do you apply additional scale-shift alignment in TABLE 4?

我的理解是Metric3D-v2对于任意数据,只要给定真实的pixel-represented focal length,就可以预测出metric depth,是这样吗?如果是的话,为什么在TABLE 4的比较中,对metric3d的结果还需要做scale-shift alignment呢?(尤其是仅在TABLE 4中出现的ScanNet数据集,我发现如果不做alignment,ScanNet预测的depth与GT偏差非常大) 我注意到TABLE 3和TABLE4中都有ETH3D的结果,而且TABLE 4的结果比TABLE 3中显著好,是否是因为对同一批测试数据,TABLE 3没做alignment,TABLE 4的结果做了alignment导致的? 以上疑惑想跟您请教,谢谢您!

Table4中做了 alignment,因为是relative depth评估;table3中是metric depth,因此没做align。 ScanNet上的问题,可能是训练时的数据处理后还有噪声引起的。

我们align的时候只做了 scale alignment而不是 scale-shift alignment

JUGGHM commented 1 week ago

I still can't understand it. The paper claims that through canonical camera transformation, the output predicted depth (after detransformed) is expected to be particularly the metric depth. There is no need to apply scale-shift alignment, isn't it? But why do you apply additional scale-shift alignment in TABLE 4? 我的理解是Metric3D-v2对于任意数据,只要给定真实的pixel-represented focal length,就可以预测出metric depth,是这样吗?如果是的话,为什么在TABLE 4的比较中,对metric3d的结果还需要做scale-shift alignment呢?(尤其是仅在TABLE 4中出现的ScanNet数据集,我发现如果不做alignment,ScanNet预测的depth与GT偏差非常大) 我注意到TABLE 3和TABLE4中都有ETH3D的结果,而且TABLE 4的结果比TABLE 3中显著好,是否是因为对同一批测试数据,TABLE 3没做alignment,TABLE 4的结果做了alignment导致的? 以上疑惑想跟您请教,谢谢您!

Table4中做了 alignment,因为是relative depth评估;table3中是metric depth,因此没做align。 ScanNet上的问题,可能是训练时的数据处理后还有噪声引起的。

我们align的时候只做了 scale alignment而不是 scale-shift alignment

这里噪声指的是把scannet加入训练集后,可能depth有些地方没有处理好,这个需要核实一下。在Scannet上法向还是正常的。