brown-ivl / manus

[CVPR 2024] MANUS: Markerless Grasp Capture using Articulated 3D Gaussians
https://ivl.cs.brown.edu/research/manus.html
15 stars 6 forks source link

How to extract frame from video? #11

Closed TheDepe closed 1 month ago

TheDepe commented 1 month ago

Given the mano meshes and poses, eg 00000404, how does this correspond to the raw RGB video?

(The synced .jpgs don't seem to be available anywhere?)

TheDepe commented 1 month ago

To add on to this, the camera data in optim.txt doesn't seem to quite line up with the ground truth camera RGBs. Is there anything I need to be aware of in this process?

  1. Load the camera matrix into blender
  2. load the given mesh for video X frame Y
  3. extract frame Y from video X
  4. Render and overlay images. <- this produces inconsistent results

frame402_brics-sbc-002_cam0_overlay

frame402_brics-sbc-007_cam1_overlay

I have tried this with and without distorting the images + intrinsics.

coreqode commented 1 month ago

Please check scripts/dataset_helpers/load_videos.py file on using MANO parameters. Also, please use calib.object camera parameters for grasp sequences as well.

TheDepe commented 1 month ago

Thanks for uploading. Have you got any tips for recreating the cameras in blender?

The location seems to be fine, but perhaps I'm still missing something on the intrinsics?

coreqode commented 1 month ago

Based on the image you posted, this look like some issue in FOV as rendered image looks little scaled up. How are you adding cameras to Blender? From my past experience, it has been tricky to replicate camera in the Blender. Most of the time I do inverse i.e. create cameras in Blender and then use them everywhere rather than importing cameras to Blender. You can also try to look into Mitsuba render where it shouldn't be difficult to add cameras.

TheDepe commented 1 month ago

I've tried various different methods, all reproduce the above results.

Currently I've got:

    # Extract camera extrinsics (rotation and location)
    cam_names = cameras.cam_name
    SCALE = 1
    for cam_name in cam_names:
        camera = cameras[cam_name]
        # Create the camera object
        bpy.ops.object.camera_add()
        ob = bpy.context.object
        ob.name = cam_name
        cam = ob.data
        cam.name = "_"+cam_name
        cam.type = 'PERSP'
        K = camera.K
        RT = camera.extr

        R_world2cv = RT[:3,:3]
        T_world2cv = RT[:3, 3]

        scene = bpy.context.scene
        sensor_width_in_mm = K[1,1]*K[0,2] / (K[0,0]*K[1,2])
        sensor_height_in_mm = 1 # Doesn't matter apparently
        resolution_x_in_px = 1280
        resolution_y_in_px = 720
        pixel_aspect_ratio = resolution_x_in_px / resolution_y_in_px

        s_u = resolution_x_in_px / sensor_width_in_mm
        s_v = resolution_y_in_px / sensor_height_in_mm

        f_in_mm = K[0,0] / s_u

        scene.render.resolution_x = int(resolution_x_in_px / SCALE)
        scene.render.resolution_y = int(resolution_y_in_px / SCALE)
        scene.render.resolution_percentage = SCALE * 100

        R_bcam2cv = mathutils.Matrix(
            ((1,0,0),
             (0,-1,0),
             (0,0,-1))
        ) 
        R_cv2world = R_world2cv.T
        rotation = mathutils.Matrix(R_cv2world @ R_bcam2cv)
        location = mathutils.Vector(-R_cv2world @ T_world2cv)

        sensor_fit = get_sensor_fit(cam.sensor_fit,
                                    scene.render.pixel_aspect_x * resolution_x_in_px,
                                    scene.render.pixel_aspect_y * resolution_y_in_px
                                    )
        if sensor_fit == 'HORIZONTAL':
            view_fac_in_px = resolution_x_in_px
        else:
            view_fac_in_px = pixel_aspect_ratio * resolution_y_in_px

        cam.lens = f_in_mm 
        cam.lens_unit = 'MILLIMETERS'
        cam.sensor_width = sensor_width_in_mm

        ob.matrix_world = mathutils.Matrix.Translation(location) @ rotation.to_4x4()
        cam.shift_x = (-(K[0,2] - (resolution_x_in_px / 2)) / view_fac_in_px )
        cam.shift_y =  ((K[1,2] - (resolution_y_in_px / 2)) * pixel_aspect_ratio / view_fac_in_px)

        ob.scale = (0.05, 0.05, 0.05)
        scene.camera = ob
        bpy.context.view_layer.update()

Which is producing images like: frame402_brics-sbc-001_cam0_overlay

frame402_brics-sbc-003_cam1_overlay

frame402_brics-sbc-015_cam1_overlay

As you can see, some renders are very close to the ground truth, whereas some are way off.

I can only assume there is some hidden setting / something happening within blender that is causing this, because as far as I can tell, the camera matrices are fine...

TheDepe commented 1 month ago

brics-sbc-012_cam1

brics-sbc-012_cam0

brics-sbc-014_cam0

Looks like we have the same issue using the load_videos.py script provided. Are we sure that the data we have available on s3 is the same as what you have locally?

TheDepe commented 1 month ago

frame402_brics-sbc-002_cam0_overlay

brics-sbc-002_cam0

frame402_brics-sbc-001_cam0_overlay

brics-sbc-001_cam0

Blender rendering, and the load_videos vis side by side for comparison...

Thanks heaps in advance :)

coreqode commented 1 month ago

Okay. This is weird. Before I check, are you using calib.object/optim_params.txt from annotationsv1 ?

TheDepe commented 1 month ago

I'm pretty sure. These are the first 7 (?) lines of the txt file. (its certainly calib.object/optim_params.txt).

# cam_id width height fx fy cx cy k1 k2 p1 p2 cam_name qvecw qvecx qvecy qvecz tvecx tvecy tvecz
1 1280 720 1026.0201416015625 1027.5439453125 610.0775909423828 360.19432067871094 -0.4050115775619082 0.15234986364467581 -0.0012788906025474026 -0.0004068769583847776 brics-sbc-001_cam0 -0.02809431082791538 -0.24069484149949982 -0.22274700692813537 0.9442777525067465 0.6040823968770638 0.6985741415591779 0.7465073698173668
2 1280 720 1030.0947265625 1032.8162841796875 671.0575103759766 333.5317039489746 -0.3954111770491126 0.13306548640003837 -0.0017064823549450168 0.00023688602683473285 brics-sbc-002_cam0 -0.06710351065884505 0.26893847066349275 -0.18577884635119832 0.9426852274767319 0.12459160041988171 0.8045163829071368 0.2668083193745772
3 1280 720 1021.106201171875 1027.1220703125 651.9766235351562 323.3529567718506 -0.42911766412349855 0.20212108739815854 -0.0020637733297845587 0.0004869770034345733 brics-sbc-003_cam0 0.007062493139756935 -0.03702306635683105 0.24603817180283094 0.9685270423504494 0.49946855336012375 0.14888683049987395 -0.027791229562286558
4 1280 720 1018.4395751953125 1020.4609375 677.6425170898438 367.72162914276123 -0.40813592570581275 0.16216126956267712 -0.00033667278662470867 -4.9234652340841565e-05 brics-sbc-003_cam1 0.07672221096701856 -0.21135993147494947 0.2440245238862591 0.9433412497359168 0.7914018886915728 0.10845784225987166 0.31012763727637427
5 1280 720 1033.849853515625 1033.9210205078125 629.6696853637695 372.642560005188 -0.3903463620693376 0.1340028214143933 -0.0005088047299663427 -0.0012211266263117172 brics-sbc-004_cam0 -0.08071402834171497 0.29870752513803706 -0.14130520136857125 0.9403679599608599 0.11865108114644048 0.4796065492349858 0.07060898886744094
7 1280 720 1018.477783203125 1019.5985107421875 618.728141784668 372.36180782318115 -0.40699917410290337 0.15397713669845436 0.00041775120797562306 -0.0007254780843419312 brics-sbc-005_cam0 -0.037005256804291085 -0.004516956480879485 -0.32762985308759857 0.9440703826721802 0.4382756994109049 0.6665888468353153 0.5365138981165001
8 1280 720 1018.8294677734375 1015.9061889648438 651.1561584472656 380.2546262741089 -0.4128141981111411 0.1585450339249105 -0.0024852751331382833 -0.0008747759071864358 brics-sbc-005_cam1 -9.371224745645163e-06 -0.33855418014488825 -0.07945378690476773 0.9375863494980579 0.6025883705779318 0.42874560435236575 0.6142561818375181
TheDepe commented 1 month ago

This was the issue. I still had the old .txt files!

Thanks for helping :)