Arthur151 / ROMP

Monocular, One-stage, Regression of Multiple 3D People and their 3D positions & trajectories in camera & global coordinates. ROMP[ICCV21], BEV[CVPR22], TRACE[CVPR2023]
https://www.yusun.work/
Apache License 2.0
1.35k stars 230 forks source link

image cropping method might be wrong #175

Open ZhengdiYu opened 2 years ago

ZhengdiYu commented 2 years ago

https://github.com/Arthur151/ROMP/blob/623687a37cb7d1ba4538baf1e3c6f65808a36e2c/romp/lib/utils/augments.py#L167-L169

Here, I guess you meant to calculate a bounding box for a single person, based on the visible kpts and then expend the box.

However, when you do leftTop, rightBottom = np.clip(box[0], 0, width), np.clip(box[1], 0, height), as I understand box[0] is (Xmin, Ymin). But why is Ymin also constrained by width? The same goes for the next term, why is Xmax constrained by height? This might cause truncation of a person.

For example, if there is a very long image size like 300 x 900, the person is in the right of the image, the green box is the box calculated by cal_aabb(), the red dots are kpts (Fig 1.):

But after Line 168 and Line 169, it will become fig3. **This is because, in the Line168, the X value will be constrained by 'height'. This will make the X value of RightBottom even smaller than LeftTop's X value, and gave a box like Fig 2.**

Finally, the cropped image will be a completely blank image (!) image

You can reproduce the results very quick, by copying the code below and use this image, it's exactly the same as your code, you can put it directly into your main() in augments.py, I pre-defined 5 kpts based for the person (red dots): QQ截图20220321030726 kps2d = np.array([[[800,70, 1],[840,133, 1],[750,137, 1],[750,245, 1], [840, 235, 1]]])

def processssss(originImage, full_kp2ds=None, augments=None, is_pose2d=True, multiperson=False):
    crop_trbl, bbox = (0,0,0,0), None

    if augments is not None:
        height, width = originImage.shape[0], originImage.shape[1]
        scale, rot, flip = augments

        if rot != 0:
            originImage, full_kp2ds = img_kp_rotate(originImage, full_kp2ds, rot)

        if flip:
            originImage = np.fliplr(originImage)
            full_kp2ds = [flip_kps(kps_i, width=originImage.shape[1], is_pose=is_2d_pose) for kps_i, is_2d_pose in zip(full_kp2ds, is_pose2d)]

        if not multiperson and is_pose2d.sum()>0:
            kps_vis = full_kp2ds[0]#[valid_range][np.where(np.array(is_pose2d[valid_range]))[0][random_idx]]
            if (kps_vis[:,2]>0).sum()>2:
                box = calc_aabb(kps_vis[kps_vis[:,2]>0,:2].copy())

                x = originImage.copy()
                cv2.rectangle(x, (int(box[0][0]), int(box[0][1])), (int(box[1][0]), int(box[1][1])), (0, 255, 0), thickness=2, lineType=4)
                cv2.circle(x, (int(box[0][0]), int(box[0][1])), color=(255, 0, 0), radius=5, thickness = -1)
                cv2.circle(x, (int(box[1][0]), int(box[1][1])), color=(255, 0, 0), radius=5, thickness = -1)
                for i in kps_vis[kps_vis[:,2]>0,:2]:
                    cv2.circle(x, (i[0], i[1]), color=(0, 0, 255), radius=3, thickness = -1)
                cv2.imwrite(f'./original_box_with_kp2d.jpg', x)

                y = originImage.copy()
                leftTop, rightBottom = np.clip(box[0], 0, width), np.clip(box[1], 0, height)
                cv2.rectangle(y, (int(leftTop[0]) ,int(leftTop[1])), (int(rightBottom[0]), int(rightBottom [1])), (0, 255, 0), thickness=2, lineType=4)
                for i in kps_vis[kps_vis[:,2]>0,:2]:
                    cv2.circle(y, (i[0], i[1]), color=(0, 0, 255), radius=3, thickness = -1)
                cv2.imwrite(f'./clipped_original_box_with_kp2d.jpg', y)

                z = originImage.copy()
                [l, t], [r, b] = get_image_cut_box(leftTop, rightBottom, scale)
                bbox = (l,t,r,b)
                cv2.rectangle(z, (int(l) ,int(t)), (int(r), int(b)), (0, 255, 0), thickness=2, lineType=4)  
                for i in kps_vis[kps_vis[:,2]>0,:2]:
                    cv2.circle(z, (i[0], i[1]), color=(0, 0, 255), radius=3, thickness = -1)
                cv2.imwrite(f'./rectified_box_with_kp2d.jpg', z)

    orgImage_white_bg, pad_trbl = image_pad_white_bg(originImage)
    if full_kp2ds is None and augments is None:
        return orgImage_white_bg, pad_trbl

    image_aug, kp2ds_aug, offsets = image_crop_pad(originImage, kp2ds=full_kp2ds, crop_trbl=crop_trbl, bbox=bbox, pad_ratio=1.)
    cv2.imwrite(f'./final.jpg', image_aug)
    return image_aug, orgImage_white_bg, kp2ds_aug, offsets

if __name__ == '__main__':
    image = cv2.imread('/apdcephfs/share_1290939/zhengdiyu/projects/multi-hand-recovery/2.jpg', cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
    kps2d = np.array([[[800,70, 1],[840,133, 1],[750,137, 1],[750,245, 1], [840, 235, 1]]])
    scale = np.random.rand() * (1.7 - 1.2) + 1.2
    print(image.shape)
    print('kps shape', kps2d.shape)
    return_img = processssss(image, kps2d, augments=(scale, 0, False), is_pose2d=np.array([True]))

=========== I don't know if I understand your meaning correctly. If not, could you explain to me why we need truncated person, will it benefit training?

Supp: If I want to get a reasonable result, I can just get it by comment Line 167 leftTop, rightBottom = np.clip(box[0], 0, width), np.clip(box[1], 0, height): My RESULTS image

By the way, I think that we can choose full_kps within the range: valid_range = (full_kp2ds[:, :, 2]>0).sum(-1) > 2 instead of randomly choosing one from full_kp2ds and then judging whether it has if (kps_vis[:,2]>0).sum()>2 or not:

            valid_range = (full_kp2ds[:, :, 2]>0).sum(-1) > 2
            kps_vis = full_kp2ds[valid_range][np.where(np.array(is_pose2d[valid_range]))[0][random_idx]]
            #if (kps_vis[:,2]>0).sum()>2:
            box = calc_aabb(kps_vis[kps_vis[:,2]>0,:2].copy())
Arthur151 commented 2 years ago

Thanks a lot for reporting this bug. Yes, I have found that current cropping function might crop out area without people. I am looking into this. B.T.W., the stickman is cute.