NVIDIA / Dataset_Synthesizer

NVIDIA Deep learning Dataset Synthesizer (NDDS)
Other
563 stars 124 forks source link

How do you make transform matrix for Camera resectioning? #51

Open darwinharianto opened 4 years ago

darwinharianto commented 4 years ago

I tried to calculate the 2d projection from 3d location using matrix that is provided on object data and camera data.

The matrix that I used is from https://en.wikipedia.org/wiki/Camera_resectioning . 2D = K [R|T] 3D intrinsic parameters are taken from camera data, then extrinsic parameters from object data.

sample: { "camera_settings": [ { "name": "Viewpoint", "horizontal_fov": 90, "intrinsic_settings": { "resX": 640, "resY": 480, "fx": 320, "fy": 320, "cx": 320, "cy": 240, "s": 0 }, "captured_image_size": { "width": 640, "height": 480 } } ] }

{ "camera_data": { "location_worldframe": [ -75.283897399902344, -618.38958740234375, 0 ], "quaternion_xyzw_worldframe": [ 0, 0, 0.66299998760223389, 0.74860000610351562 ] }, "objects": [ { "class": "", "instance_id": 15578836, "visibility": 1, "location": [ 9.9271001815795898, 40, 621.746826171875 ], "quaternion_xyzw": [ 0, -0.66299998760223389, 0, 0.74860000610351562 ], "pose_transform": [ [ -0.99269998073577881, 0, 0.120899997651577, 0 ], [ 0.120899997651577, 0, 0.99269998073577881, 0 ], [ 0, -1, 0, 0 ], [ 9.9271001815795898, 40, 621.746826171875, 1 ] ], "cuboid_centroid": [ 10.677499771118164, -9.4661998748779297, 621.6553955078125 ], "projected_cuboid_centroid": [ 260.3970947265625, 250.80239868164062 ], "bounding_box": { "top_left": [ 222.77049255371094, 248.47850036621094 ], "bottom_right": [ 279.02569580078125, 271.57791137695312 ] }, "cuboid": [ [ -11.595800399780273, -58.982799530029297, 650.97088623046875 ], [ -17.978900909423828, -58.982799530029297, 598.54022216796875 ], [ -17.978900909423828, 40.050399780273438, 598.54022216796875 ], [ -11.595800399780273, 40.050399780273438, 650.97088623046875 ], [ 39.333900451660156, -58.982799530029297, 644.7706298828125 ], [ 32.950901031494141, -58.982799530029297, 592.34002685546875 ], [ 32.950901031494141, 40.050399780273438, 592.34002685546875 ], [ 39.333900451660156, 40.050399780273438, 644.7706298828125 ] ], "projected_cuboid": [ [ 251.43980407714844, 225.07260131835938 ], [ 248.310302734375, 222.36349487304688 ], [ 248.310302734375, 278.83981323242188 ], [ 251.43980407714844, 277.00021362304688 ], [ 271.6171875, 224.77520751953125 ], [ 270.24090576171875, 222.01139831542969 ], [ 270.24090576171875, 279.07888793945312 ], [ 271.6171875, 277.20208740234375 ] ] } ] }

so K matrix is [[320,0,320], [0,320,240], [0,0,1]]

suppose a is rotation matrix from quaternion (xyzw) ([ 0, 0, 0.66299998760223389, 0.74860000610351562 ])

R matrix is a transposed

T matrix is -a transposed multiplied by translation matrix ([[-75.283897399902344, -618.38958740234375, 0]]) transposed

Then end result would be K [R|T] multiplied by [ 10.677499771118164, -9.4661998748779297, 621.6553955078125 ].

This method gives me the wrong location. Where did I go wrong?

darwinharianto commented 4 years ago

What I found it so far, Matrix for transformation would be 2D = K 3D because json data is already on camera coordinate frame.

From my test, This holds true for data that has fx,fy,cx,cy = 256,256,256,256 | 256 0 256 0| | 0 256 256 0| | 0 0 1 0|

but for data that has fx,fy,cx,cy = 320,320,320,240 | 320 0 320 0| | 0 320 240 0| | 0 0 1 0| this matrix fails. How do I make the transformation matrix?

darwinharianto commented 4 years ago

I found out I have to make image output have the scale of 2x focal length. If it is not, the projection would fail.

Any idea how to make a transformation matrix if image pixels doesn't have 2x focal length value?