Monocular, One-stage, Regression of Multiple 3D People and their 3D positions & trajectories in camera & global coordinates. ROMP[ICCV21], BEV[CVPR22], TRACE[CVPR2023]
Here, I guess you meant to calculate a bounding box for a single person, based on the visible kpts and then expend the box.
However, when you do leftTop, rightBottom = np.clip(box[0], 0, width), np.clip(box[1], 0, height), as I understand box[0] is (Xmin, Ymin). But why is Ymin also constrained by width? The same goes for the next term, why is Xmax constrained by height? This might cause truncation of a person.
For example, if there is a very long image size like 300 x 900, the person is in the right of the image, the green box is the box calculated by cal_aabb(), the red dots are kpts (Fig 1.):
But after Line 168 and Line 169, it will become fig3. **This is because, in the Line168, the X value will be constrained by 'height'. This will make the X value of RightBottom even smaller than LeftTop's X value, and gave a box like Fig 2.**
Finally, the cropped image will be a completely blank image (!)
You can reproduce the results very quick, by copying the code below and use this image, it's exactly the same as your code, you can put it directly into your main() in augments.py, I pre-defined 5 kpts based for the person (red dots):
kps2d = np.array([[[800,70, 1],[840,133, 1],[750,137, 1],[750,245, 1], [840, 235, 1]]])
def processssss(originImage, full_kp2ds=None, augments=None, is_pose2d=True, multiperson=False):
crop_trbl, bbox = (0,0,0,0), None
if augments is not None:
height, width = originImage.shape[0], originImage.shape[1]
scale, rot, flip = augments
if rot != 0:
originImage, full_kp2ds = img_kp_rotate(originImage, full_kp2ds, rot)
if flip:
originImage = np.fliplr(originImage)
full_kp2ds = [flip_kps(kps_i, width=originImage.shape[1], is_pose=is_2d_pose) for kps_i, is_2d_pose in zip(full_kp2ds, is_pose2d)]
if not multiperson and is_pose2d.sum()>0:
kps_vis = full_kp2ds[0]#[valid_range][np.where(np.array(is_pose2d[valid_range]))[0][random_idx]]
if (kps_vis[:,2]>0).sum()>2:
box = calc_aabb(kps_vis[kps_vis[:,2]>0,:2].copy())
x = originImage.copy()
cv2.rectangle(x, (int(box[0][0]), int(box[0][1])), (int(box[1][0]), int(box[1][1])), (0, 255, 0), thickness=2, lineType=4)
cv2.circle(x, (int(box[0][0]), int(box[0][1])), color=(255, 0, 0), radius=5, thickness = -1)
cv2.circle(x, (int(box[1][0]), int(box[1][1])), color=(255, 0, 0), radius=5, thickness = -1)
for i in kps_vis[kps_vis[:,2]>0,:2]:
cv2.circle(x, (i[0], i[1]), color=(0, 0, 255), radius=3, thickness = -1)
cv2.imwrite(f'./original_box_with_kp2d.jpg', x)
y = originImage.copy()
leftTop, rightBottom = np.clip(box[0], 0, width), np.clip(box[1], 0, height)
cv2.rectangle(y, (int(leftTop[0]) ,int(leftTop[1])), (int(rightBottom[0]), int(rightBottom [1])), (0, 255, 0), thickness=2, lineType=4)
for i in kps_vis[kps_vis[:,2]>0,:2]:
cv2.circle(y, (i[0], i[1]), color=(0, 0, 255), radius=3, thickness = -1)
cv2.imwrite(f'./clipped_original_box_with_kp2d.jpg', y)
z = originImage.copy()
[l, t], [r, b] = get_image_cut_box(leftTop, rightBottom, scale)
bbox = (l,t,r,b)
cv2.rectangle(z, (int(l) ,int(t)), (int(r), int(b)), (0, 255, 0), thickness=2, lineType=4)
for i in kps_vis[kps_vis[:,2]>0,:2]:
cv2.circle(z, (i[0], i[1]), color=(0, 0, 255), radius=3, thickness = -1)
cv2.imwrite(f'./rectified_box_with_kp2d.jpg', z)
orgImage_white_bg, pad_trbl = image_pad_white_bg(originImage)
if full_kp2ds is None and augments is None:
return orgImage_white_bg, pad_trbl
image_aug, kp2ds_aug, offsets = image_crop_pad(originImage, kp2ds=full_kp2ds, crop_trbl=crop_trbl, bbox=bbox, pad_ratio=1.)
cv2.imwrite(f'./final.jpg', image_aug)
return image_aug, orgImage_white_bg, kp2ds_aug, offsets
if __name__ == '__main__':
image = cv2.imread('/apdcephfs/share_1290939/zhengdiyu/projects/multi-hand-recovery/2.jpg', cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
kps2d = np.array([[[800,70, 1],[840,133, 1],[750,137, 1],[750,245, 1], [840, 235, 1]]])
scale = np.random.rand() * (1.7 - 1.2) + 1.2
print(image.shape)
print('kps shape', kps2d.shape)
return_img = processssss(image, kps2d, augments=(scale, 0, False), is_pose2d=np.array([True]))
===========
I don't know if I understand your meaning correctly. If not, could you explain to me why we need truncated person, will it benefit training?
Supp:
If I want to get a reasonable result, I can just get it by comment Line 167 leftTop, rightBottom = np.clip(box[0], 0, width), np.clip(box[1], 0, height):
My RESULTS
By the way, I think that we can choose full_kps within the range: valid_range = (full_kp2ds[:, :, 2]>0).sum(-1) > 2 instead of randomly choosing one from full_kp2ds and then judging whether it has if (kps_vis[:,2]>0).sum()>2 or not:
Thanks a lot for reporting this bug. Yes, I have found that current cropping function might crop out area without people. I am looking into this. B.T.W., the stickman is cute.
https://github.com/Arthur151/ROMP/blob/623687a37cb7d1ba4538baf1e3c6f65808a36e2c/romp/lib/utils/augments.py#L167-L169
Here, I guess you meant to calculate a bounding box for a single person, based on the visible kpts and then expend the box.
However, when you do
leftTop, rightBottom = np.clip(box[0], 0, width), np.clip(box[1], 0, height)
, as I understand box[0] is (Xmin, Ymin). But why is Ymin also constrained by width? The same goes for the next term, why is Xmax constrained by height? This might cause truncation of a person.For example, if there is a very long image size like 300 x 900, the person is in the right of the image, the green box is the box calculated by cal_aabb(), the red dots are kpts (Fig 1.):
But after Line 168 and Line 169, it will become fig3. **This is because, in the Line168, the X value will be constrained by 'height'. This will make the X value of RightBottom even smaller than LeftTop's X value, and gave a box like Fig 2.**
Finally, the cropped image will be a completely blank image (!)
You can reproduce the results very quick, by copying the code below and use this image, it's exactly the same as your code, you can put it directly into your main() in augments.py, I pre-defined 5 kpts based for the person (red dots): kps2d = np.array([[[800,70, 1],[840,133, 1],[750,137, 1],[750,245, 1], [840, 235, 1]]])
=========== I don't know if I understand your meaning correctly. If not, could you explain to me why we need truncated person, will it benefit training?
Supp: If I want to get a reasonable result, I can just get it by comment Line 167
leftTop, rightBottom = np.clip(box[0], 0, width), np.clip(box[1], 0, height)
: My RESULTSBy the way, I think that we can choose full_kps within the range:
valid_range = (full_kp2ds[:, :, 2]>0).sum(-1) > 2
instead of randomly choosing one from full_kp2ds and then judging whether it hasif (kps_vis[:,2]>0).sum()>2
or not: