Closed TheDepe closed 1 month ago
To add on to this, the camera data in optim.txt doesn't seem to quite line up with the ground truth camera RGBs. Is there anything I need to be aware of in this process?
I have tried this with and without distorting the images + intrinsics.
Please check scripts/dataset_helpers/load_videos.py
file on using MANO parameters. Also, please use calib.object
camera parameters for grasp sequences as well.
Thanks for uploading. Have you got any tips for recreating the cameras in blender?
The location seems to be fine, but perhaps I'm still missing something on the intrinsics?
Based on the image you posted, this look like some issue in FOV as rendered image looks little scaled up. How are you adding cameras to Blender? From my past experience, it has been tricky to replicate camera in the Blender. Most of the time I do inverse i.e. create cameras in Blender and then use them everywhere rather than importing cameras to Blender. You can also try to look into Mitsuba render where it shouldn't be difficult to add cameras.
I've tried various different methods, all reproduce the above results.
Currently I've got:
# Extract camera extrinsics (rotation and location)
cam_names = cameras.cam_name
SCALE = 1
for cam_name in cam_names:
camera = cameras[cam_name]
# Create the camera object
bpy.ops.object.camera_add()
ob = bpy.context.object
ob.name = cam_name
cam = ob.data
cam.name = "_"+cam_name
cam.type = 'PERSP'
K = camera.K
RT = camera.extr
R_world2cv = RT[:3,:3]
T_world2cv = RT[:3, 3]
scene = bpy.context.scene
sensor_width_in_mm = K[1,1]*K[0,2] / (K[0,0]*K[1,2])
sensor_height_in_mm = 1 # Doesn't matter apparently
resolution_x_in_px = 1280
resolution_y_in_px = 720
pixel_aspect_ratio = resolution_x_in_px / resolution_y_in_px
s_u = resolution_x_in_px / sensor_width_in_mm
s_v = resolution_y_in_px / sensor_height_in_mm
f_in_mm = K[0,0] / s_u
scene.render.resolution_x = int(resolution_x_in_px / SCALE)
scene.render.resolution_y = int(resolution_y_in_px / SCALE)
scene.render.resolution_percentage = SCALE * 100
R_bcam2cv = mathutils.Matrix(
((1,0,0),
(0,-1,0),
(0,0,-1))
)
R_cv2world = R_world2cv.T
rotation = mathutils.Matrix(R_cv2world @ R_bcam2cv)
location = mathutils.Vector(-R_cv2world @ T_world2cv)
sensor_fit = get_sensor_fit(cam.sensor_fit,
scene.render.pixel_aspect_x * resolution_x_in_px,
scene.render.pixel_aspect_y * resolution_y_in_px
)
if sensor_fit == 'HORIZONTAL':
view_fac_in_px = resolution_x_in_px
else:
view_fac_in_px = pixel_aspect_ratio * resolution_y_in_px
cam.lens = f_in_mm
cam.lens_unit = 'MILLIMETERS'
cam.sensor_width = sensor_width_in_mm
ob.matrix_world = mathutils.Matrix.Translation(location) @ rotation.to_4x4()
cam.shift_x = (-(K[0,2] - (resolution_x_in_px / 2)) / view_fac_in_px )
cam.shift_y = ((K[1,2] - (resolution_y_in_px / 2)) * pixel_aspect_ratio / view_fac_in_px)
ob.scale = (0.05, 0.05, 0.05)
scene.camera = ob
bpy.context.view_layer.update()
Which is producing images like:
As you can see, some renders are very close to the ground truth, whereas some are way off.
I can only assume there is some hidden setting / something happening within blender that is causing this, because as far as I can tell, the camera matrices are fine...
Looks like we have the same issue using the load_videos.py script provided. Are we sure that the data we have available on s3 is the same as what you have locally?
Blender rendering, and the load_videos vis side by side for comparison...
Thanks heaps in advance :)
Okay. This is weird.
Before I check, are you using calib.object/optim_params.txt
from annotationsv1
?
I'm pretty sure. These are the first 7 (?) lines of the txt file. (its certainly calib.object/optim_params.txt).
# cam_id width height fx fy cx cy k1 k2 p1 p2 cam_name qvecw qvecx qvecy qvecz tvecx tvecy tvecz
1 1280 720 1026.0201416015625 1027.5439453125 610.0775909423828 360.19432067871094 -0.4050115775619082 0.15234986364467581 -0.0012788906025474026 -0.0004068769583847776 brics-sbc-001_cam0 -0.02809431082791538 -0.24069484149949982 -0.22274700692813537 0.9442777525067465 0.6040823968770638 0.6985741415591779 0.7465073698173668
2 1280 720 1030.0947265625 1032.8162841796875 671.0575103759766 333.5317039489746 -0.3954111770491126 0.13306548640003837 -0.0017064823549450168 0.00023688602683473285 brics-sbc-002_cam0 -0.06710351065884505 0.26893847066349275 -0.18577884635119832 0.9426852274767319 0.12459160041988171 0.8045163829071368 0.2668083193745772
3 1280 720 1021.106201171875 1027.1220703125 651.9766235351562 323.3529567718506 -0.42911766412349855 0.20212108739815854 -0.0020637733297845587 0.0004869770034345733 brics-sbc-003_cam0 0.007062493139756935 -0.03702306635683105 0.24603817180283094 0.9685270423504494 0.49946855336012375 0.14888683049987395 -0.027791229562286558
4 1280 720 1018.4395751953125 1020.4609375 677.6425170898438 367.72162914276123 -0.40813592570581275 0.16216126956267712 -0.00033667278662470867 -4.9234652340841565e-05 brics-sbc-003_cam1 0.07672221096701856 -0.21135993147494947 0.2440245238862591 0.9433412497359168 0.7914018886915728 0.10845784225987166 0.31012763727637427
5 1280 720 1033.849853515625 1033.9210205078125 629.6696853637695 372.642560005188 -0.3903463620693376 0.1340028214143933 -0.0005088047299663427 -0.0012211266263117172 brics-sbc-004_cam0 -0.08071402834171497 0.29870752513803706 -0.14130520136857125 0.9403679599608599 0.11865108114644048 0.4796065492349858 0.07060898886744094
7 1280 720 1018.477783203125 1019.5985107421875 618.728141784668 372.36180782318115 -0.40699917410290337 0.15397713669845436 0.00041775120797562306 -0.0007254780843419312 brics-sbc-005_cam0 -0.037005256804291085 -0.004516956480879485 -0.32762985308759857 0.9440703826721802 0.4382756994109049 0.6665888468353153 0.5365138981165001
8 1280 720 1018.8294677734375 1015.9061889648438 651.1561584472656 380.2546262741089 -0.4128141981111411 0.1585450339249105 -0.0024852751331382833 -0.0008747759071864358 brics-sbc-005_cam1 -9.371224745645163e-06 -0.33855418014488825 -0.07945378690476773 0.9375863494980579 0.6025883705779318 0.42874560435236575 0.6142561818375181
This was the issue. I still had the old .txt files!
Thanks for helping :)
Given the mano meshes and poses, eg 00000404, how does this correspond to the raw RGB video?
(The synced .jpgs don't seem to be available anywhere?)