Don't understand - Githubissues

shanqiu24 commented 2 years ago

Hello 😊 In the monocon_heads.py--->in the _get_predictions method：the center2kpt_offset_pred head make the 64 channels of the backbone network output into 18 channels : 8 corner points and center point in the 3D bbox project to the 2D , the offset of these 9 points with 2D bbox center point，right? And the kpt_heatmap_offset_head make the 64 channels of the backbone network output into 2 channels，But here we have 9 keypoints right? Don't understand

shanqiu24 commented 2 years ago

Hello 😊 In the monocon_heads.py--->in the _get_predictions method：the center2kpt_offset_pred head make the 64 channels of the backbone network output into 18 channels : 8 corner points and center point in the 3D bbox project to the 2D , the offset of these 9 points with 2D bbox center point，right? And the kpt_heatmap_offset_head make the 64 channels of the backbone network output into 2 channels，But here we have 9 keypoints right? Don't understand

I just mess up these 2 head's output channels's means😵

2gunsu commented 2 years ago

Hello.

Suppose that you have an image with width and height of 1248 and 384, respectively, and let's say you have a single ground truth keypoint at coordinate (1205, 374). As this image passes through the Backbone and Neck, the width and height will be reduced by 4 times, so that the width and height will be changed to 312 and 96 respectively.

However, a single ground truth keypoint at coordinate (1205, 374) changes to (301.25, 93.5) when divided by 4, which is not correct. Because coordinates cannot be decimal. Therefore, as it is changed to a close integer, an error (Quantization Residual) occurs on the x and y axes, respectively, and this offset is learned from the model.

From the above, all predicted keypoints each have their own offset (error). Since this error is a value along the x and y axes, it is 2 values, and since we have 9 keypoints, there is a total of 18-dimensional data.

You can see how the target of this offset is generated in utils/target_generator/TargetGenerator.

target['kpt_heatmap_offset_target'][b_idx, o_idx, (k_idx * 2)] = (kptx - kptx_int)
target['kpt_heatmap_offset_target'][b_idx, o_idx, (k_idx * 2) + 1] = (kpty - kpty_int)

shanqiu24 commented 2 years ago

Hello.

Suppose that you have an image with width and height of 1248 and 384, respectively, and let's say you have a single ground truth keypoint at coordinate (1205, 374). As this image passes through the Backbone and Neck, the width and height will be reduced by 4 times, so that the width and height will be changed to 312 and 96 respectively.

However, a single ground truth keypoint at coordinate (1205, 374) changes to (301.25, 93.5) when divided by 4, which is not correct. Because coordinates cannot be decimal. Therefore, as it is changed to a close integer, an error (Quantization Residual) occurs on the x and y axes, respectively, and this offset is learned from the model.

From the above, all predicted keypoints each have their own offset (error). Since this error is a value along the x and y axes, it is 2 values, and since we have 9 keypoints, there is a total of 18-dimensional data.

You can see how the target of this offset is generated in utils/target_generator/TargetGenerator.
target['kpt_heatmap_offset_target'][b_idx, o_idx, (k_idx * 2)] = (kptx - kptx_int)
target['kpt_heatmap_offset_target'][b_idx, o_idx, (k_idx * 2) + 1] = (kpty - kpty_int)

the 18 channels is the kps own's Downsampling error right? the head's name misslead me😂

2gunsu commented 2 years ago

Yes, that's right. 😊 If there are keypoints (k1, k2, k3, ..., k9) and the error of each keypoint is (x, y), the 18 channels you mentioned are made up of (x1, y1, x2, y2, ..., x9, y9).

shanqiu24 commented 2 years ago

kpt_heatmap_offset_pred = self.kpt_heatmap_offset_head(feat) # 8,64,96,312--->8,2,96,312 So this head pred (3D center project to 2D，2D bbox center) offset right?

2gunsu commented 2 years ago

Could you please elaborate a little more on what you mean by "(3D center project to 2D, 2D bbox center)"?

shanqiu24 commented 2 years ago

Could you please elaborate a little more on what you mean by "(3D center project to 2D, 2D bbox center)"?

So，how can we get 3D center? decomposed to the projected 3D center in the image plane (xc, yc) and the object depth z and we have camera intrinsic matrix then we can get 3D center right? So，how can we get the projected 3D center in the image plane (xc, yc)? decompose into the center of 2D bounding box (xb, yb) and an offset right? This offset is my mentioned：(3D center project to 2D，2D bbox center) offset

2gunsu commented 2 years ago

Oh, I see what you mean. 😊 But, I have an urgent job right now, so if you don't mind me, is it okay if I give you an answer tomorrow?

shanqiu24 commented 2 years ago

Oh, I see what you mean. 😊 But, I have an urgent job right now, so if you don't mind me, is it okay if I give you an answer tomorrow?

Sure，Thank you😊

2gunsu / monocon-pytorch

Don't understand #8