align_after_view_transformation problem and mlp problem

HuangJunJie2017 / BEVDet

Code base of the BEVDet series .

Apache License 2.0

1.45k stars 266 forks source link

align_after_view_transformation problem and mlp problem #288

Closed leslie27ch closed 1 year ago

leslie27ch commented 1 year ago

当 currcam2currego=prevcam2prevego。

l02l1 = c02l0.matmul(torch.inverse(c12l0)) 这句是： precam2preegp （ precam2curego）-1 = precam2preego curego2precam 结果不是预想的 curego2preego

因此你写反了，应是： torch.inverse(c12l0)).matmul(c02l0) = curego2precam * precam2preego = curego2precego

所以这实现才性能稍低。

leslie27ch commented 1 year ago

还有mlp问题：无论是cur frame 还是 pre frame，都是输入相同的sensor2keyegos[0], ego2globals[0]，即当前帧的外参。

mlp是为了估计depth的输入。而这个depth 只是cam坐标系，depth真值：从点云投影到cam坐标系，也就是在将feature输入网络，预测depth时应只考虑内参 + postran+ postrot 跟bda，以及cam相对于谁的外参都没有关系。 bda等是后面生成了depth后才做的变换。在预测depth时没关系。此外，当前帧和上一帧都用相同的外参输入没有什么逻辑性。因此mlp 这些多余的输入可以认为是噪音。

HuangJunJie2017 commented 1 year ago

@leslie27ch 1.阁下对坐标系变换，矩阵相乘有点误解~ 正常的计算过程应该是 B2C * A2B = A2C，这个可以好好看《机器人学导论》 2.sensor2keyegos[0] 这个在同一个视频clip前后帧之间是一致的，但不同视频clip会有差别，并非一成不变，ego2globals[0]这个在mlp_input计算过程中没用到

leslie27ch commented 1 year ago

谢谢回复指正~

你说对，当时算懵了。
还是不理解：在预测depth时，连真值都是cam坐标系下的。预测depth跟外参是没有关系的。为何加入sensor2keyegos[0] 即curcam2curego？

HuangJunJie2017 commented 1 year ago

@leslie27ch 預測的目標定義在ego坐標系下，在假設ego坐標系下目標位置不變情況下（感知head是在ego坐標系下完成的），目標在圖像空間的坐標（depth）是跟外參相關的

leslie27ch commented 1 year ago

仅仅看depth预测那一刻，depth loss时真值仅仅是cam坐标系，6个相机均是各自的cam坐标系。在预测出depth后，后面的种种变换才涉及外参：从cam坐标系转到ego坐标系。

此外在预测上一帧的depth时输入：上一帧的feature + mlp（当前帧的外参）不理解。而且如果非得输入外参，上一帧应该输入：上一帧feature+ 上一帧外参。

bda在预测depth那一刻就更无关系了。depth在相机坐标系下，只与内参有关。得到depth后有了xyz坐标，再bda再转换到ego都是后面的事。