chungyiweng / humannerf

HumanNeRF turns a monocular video of moving people into a 360 free-viewpoint video.
MIT License
791 stars 87 forks source link

how to get cam_intrinsics and cam_extrinsics #1

Closed zhan-xu closed 2 years ago

zhan-xu commented 2 years ago

Hello, thanks for the great work. As I am new to the area, I have a (maybe) simple question: I am trying to get camera poses from VIBE. The output from their code seems to be a 4D array. As described in their code repo:

orig_cam (n_frames, 4) # weak perspective camera parameters in original image space (sx,sy,tx,ty)

Can I ask how to get intrinsic and extrinsic matrices?

Or is there an example about how to get these camera parameters from any code?

chungyiweng commented 2 years ago

Hello,

There are multiple ways to get intrinsic and extrinsic camera parameters. Here is my recommendation. You might want to use pred_cam and bboxes instead (see the output format of VIBE).

Let's say you want to access the 1st tracked human.

pred_cam = vibe_result['pred_cam'][0]
bbox = vibe_result['bboxes'][0]

Use the function below to get the camera parameters.

def get_camera_parameters(pred_cam, bbox):
    FOCAL_LENGTH = 5000.
    CROP_SIZE = 224

    bbox_cx, bbox_cy, bbox_w, bbox_h = bbox
    assert bbox_w == bbox_h

    bbox_size = bbox_w
    bbox_x = bbox_cx - bbox_w / 2.
    bbox_y = bbox_cy - bbox_h / 2.

    scale = bbox_size / CROP_SIZE

    cam_intrinsics = np.eye(3)
    cam_intrinsics[0, 0] = FOCAL_LENGTH * scale
    cam_intrinsics[1, 1] = FOCAL_LENGTH * scale
    cam_intrinsics[0, 2] = bbox_size / 2. + bbox_x 
    cam_intrinsics[1, 2] = bbox_size / 2. + bbox_y

    cam_s, cam_tx, cam_ty = pred_cam
    trans = [cam_tx, cam_ty, 2*FOCAL_LENGTH/(CROP_SIZE*cam_s + 1e-9)]

    cam_extrinsics = np.eye(4)
    cam_extrinsics[:3, 3] = trans

    return cam_intrinsics, cam_extrinsics

I hope this helps. Let me know if you still have any questions.

zhan-xu commented 2 years ago

Thanks so much. This code works perfectly! Really appreciate this.

Andyen512 commented 2 years ago

So how about the ROMP?The ROMP format don't have the bboxes .

zhewei-mt commented 1 year ago

Hello,

There are multiple ways to get intrinsic and extrinsic camera parameters. Here is my recommendation. You might want to use pred_cam and bboxes instead (see the output format of VIBE).

Let's say you want to access the 1st tracked human.

pred_cam = vibe_result['pred_cam'][0]
bbox = vibe_result['bboxes'][0]

Use the function below to get the camera parameters.

def get_camera_parameters(pred_cam, bbox):
    FOCAL_LENGTH = 5000.
    CROP_SIZE = 224

    bbox_cx, bbox_cy, bbox_w, bbox_h = bbox
    assert bbox_w == bbox_h

    bbox_size = bbox_w
    bbox_x = bbox_cx - bbox_w / 2.
    bbox_y = bbox_cy - bbox_h / 2.

    scale = bbox_size / CROP_SIZE

    cam_intrinsics = np.eye(3)
    cam_intrinsics[0, 0] = FOCAL_LENGTH * scale
    cam_intrinsics[1, 1] = FOCAL_LENGTH * scale
    cam_intrinsics[0, 2] = bbox_size / 2. + bbox_x 
    cam_intrinsics[1, 2] = bbox_size / 2. + bbox_y

    cam_s, cam_tx, cam_ty = pred_cam
    trans = [cam_tx, cam_ty, 2*FOCAL_LENGTH/(CROP_SIZE*cam_s + 1e-9)]

    cam_extrinsics = np.eye(4)
    cam_extrinsics[:3, 3] = trans

    return cam_intrinsics, cam_extrinsics

I hope this helps. Let me know if you still have any questions.

Hello, you code helps a lot. But I have a few questions.

  1. Why can we assume that the bounding boxes are square? From my understanding, from the aspect ratio of a full human body, width should be smaller than height and only under few circumstances where they are equal.
  2. The crop size is 224, is that the resized version of original image? And for the bboxes, they are calculated using resized image rather than original image, right? Thank in advance!
mch0dmin commented 1 year ago

So how about the ROMP?The ROMP format don't have the bboxes .

hi @Andyen512 , usr ROMP to get cam_intrinsics and cam_intrinsics, have you solved this problem yet ?

xyIsHere commented 1 year ago

Hi everyone,

I'm wondering why FOCAL_LENGTH = 5000 and CROP_SIZE = 224 in this function. Are these two variables fixed for all in the wild videos? Besides, should all the frames from a video share the same camera intrinsics and extrinsics? Thanks.

MihailMihaylov97 commented 1 year ago

Hi, did you find a way to solve this?

So how about the ROMP?The ROMP format don't have the bboxes .

MrLi333 commented 4 months ago

大家好,

我想知道为什么这个函数中的 FOCAL_LENGTH = 5000 和 CROP_SIZE = 224。这两个变量是否对所有野生视频都是固定的?此外,视频中的所有帧是否应该共享相同的相机内在和外在?谢谢。

你解决这个问题了吗?我想知道FOCAL_LENGTH应该怎么设置