Banconxuan / RTM3D

The official PyTorch Implementation of RTM3D and KM3D for Monocular 3D Object Detection
MIT License
453 stars 85 forks source link

problems when reproducing #3

Closed kaixinbear closed 3 years ago

kaixinbear commented 4 years ago

Hello. I am reproducing your work following the paper.I come across a problem when I complement the part of vertexes coordinates and vertexes offset(the former means the distances from vertexes to maincenter,the latter means the precision lost in downsample,right? ) in Loss .I just can't match the shape between predicting output and gt. From the paper,output["vertexes_coordinates"].shape is [Batch, H/S, W/S , 18], output["vertexes_offset"].shape is [Batch, H/S, W/S , 2]. I set the gt["vertexes_coordinates"].shape is[Batch, num_objs , 18], gt["vertexes_offset"].shape is[Batch, num_objs , 18]. My question is how can I match their shape to do the correct loss? I know I should extract the vertex index from the output heatmap to generate the right shape,but i don't know how to do that because
if I use _gather_feat,I extract all of vertexes but I should extract them in order.Could you give me some details of how to fix this problem? Thanks advance.

Banconxuan commented 4 years ago

vertexes offset are relative to the heatmap, and each point in a heatmap corresponds to an offset. so gt["vertexes_offset"] shape is [Batch, H/S, W/S, 2]. Figure 2 also shows the shape of this Tensor.

kaixinbear commented 4 years ago

I have two questions: 1、Did you use gaussian function in gt["vertexes_offset"] ? 2、How did you extract the tensor of shape [Batch, num_objs , 18] from output["vertexes_coordinates"]

Banconxuan commented 4 years ago

1:No 2: Refer to multi_pose_decode hm_score, hm_inds, hm_ys, hm_xs = _topk_channel(hm_hp, K=K) # b x J x K hp_offset = _transpose_and_gather_feat(hp_offset, hm_inds.view(batch, -1))

kaixinbear commented 4 years ago

I am afraid that you mistook my question2.

为了表达清楚,我用中文吧;在计算dim_loss的时候, dim_loss += self.crit_reg(output['dim'], batch['reg_mask'], batch['ind'], batch['dim']) / opt.num_stacks 这里的batch['ind']就可以把output对应的中心点信息提取出来; 我现在是要把output vertexes_coordinates heatmap(共18个通道)中的9个vertex的横纵坐标提取出来,应该是要每2个通道有一个ind, 但如果按照上面的代码的话, vertex_coordinate_loss += self.crit_reg(output['vertex_coordinate_loss'],batch['reg_mask'], batch['vertex_ind'], batch['vertex_coordinate_loss']) / opt.num_stacks 就不是按序提取的vertex的了;这里您是怎么实现的呢?

Banconxuan commented 4 years ago

原来是老哥呀,18个通道可以提前规定好顺序,比方说3D BBox的右下角对应前两个通道。再sample GT的时候参考multi_pose的hp_ind[k num_joints + j] = pt_int[1] output_res + pt_int[0] 制作index. 取到的vertex按通道顺序就是提前规定好的顺序。

kaixinbear commented 4 years ago

for i in range(0,18,2): vertex_coordinate_loss += self.crit_reg(output['vertex_coordinate_loss'[:,i:i+2], batch['reg_mask'],batch['vertex_ind'][int(i/2)], batch['vertex_coordinate_loss']) / opt.num_stacks 这样勉强算是解决了,多谢老哥

kaixinbear commented 4 years ago

老哥,我想问下公式7的最小重投影误差ecp下面的协方差矩阵代表什么啊?并且算角度优化的时候为啥要对向量用log呢

Banconxuan commented 4 years ago

协方差矩阵代表各个点的置信度,可以从特征点的heatmap中提取。因为是用李代数求的和李群之间有个log转换。

kaixinbear commented 4 years ago

协方差矩阵的含义我明白了,但是范数的下标写成协方差矩阵代表什么呢?

kaixinbear commented 4 years ago

~6C4W02{A1_JM7LGARS}S8W 并且这个参数是多少呢?

Banconxuan commented 4 years ago

在李代数中求重投影误差是一个非线性最小二乘问题一般都需提供一个协方差,形式就如论文所示。可以参考一些slam有关的资料。这个参数遗漏了系数为1,谢谢提醒。

kaixinbear commented 4 years ago

好的,了解了!谢谢你耐心的解答。pass by,公式7是不是打错了,是argmin而不是argmax

kaixinbear commented 4 years ago

One last question, the softmax fuction in KFPN adapted on the three feature maps' dimention of channel or dimention of last two channels(W and H)? From the formulation of (1), it seems softmax applied on the (W,H),however,wouldn't it be more intuitive to apply it on the dim of channels,which could add info flow between channels? Thanks,sincerely.

ZhxJia commented 4 years ago

In the formula(8), why the covariance matrix use the 2d bbox center points instead of the projected center point?