andyzeng / tsdf-fusion

Fuse multiple depth frames into a TSDF voxel volume.
http://andyzeng.github.io/
BSD 2-Clause "Simplified" License
730 stars 134 forks source link

question about camera intrinsics #7

Closed Azpril45 closed 6 years ago

Azpril45 commented 7 years ago

Dear @andyzeng

Thank you for sharing your work. I want to generate my own 3D volume data by changing some of your source code. However, I am confused by some parameters in this work. It will be appreciated that if you can give me some help.

First is the camera intrinsics. I did some research and found it is a 3x3 matrix. Then I looked through your work and found that the old and new version of this work are using different camera intrinsic matrix. Is it because you used different camera between these two work? In my own project, I'm using a Kinect as the depth camera. Can you tell me how can I set the camera intrinsics in my work? The second is about the camera pose. It seems a 4x4 matrix and changes with every single depth camera. Can you tell me what is this parameter and how to set it? The last is the input number. In your demo, you set the input number to 50. I changed the number to regenerate a voxel volume. However, when I changed the number to 1, the voxel volume turned to nothing. I am wondering can I generate a voxel volume from a single depth image? And is it the right way to change the input number to 1 if this work can generate a voxel volume from a single depth image?

Thant you

Sincerely yours Tony

andyzeng commented 7 years ago

Hello!

For the camera intrinsics, yes - the camera intrinsics differ between the data in the old and new versions because I used different cameras. The old version uses a sequence of depth frames I captured with a RealSense F200, while the new version uses a sequence of depth frames from the Microsoft 7-Scenes dataset, which uses a Kinect. Camera intrinsics are usually unique to each camera. However, default camera intrinsics for the Kinect, principle point (320,240) and focal length (585,585), are usually a good enough approximation for the Kinect, so you can start with that.

But if you care about achieving maximum accuracy for your 3D data (which in some applications, is important), you can run this calibration procedure with OpenNI with a checkerboard to estimate a more accurate intrinsics for your camera.

The camera pose (extrinsics) is a rigid transformation (consisting of a rotation matrix and translation vector) that describes the camera’s location in the world. Here is a pretty solid introduction to extrinsic camera matrices. They are usually estimated from SLAM, SfM, or other camera localization and reconstruction algorithms.

When changing the number of frames from 50 to 1, the voxel volume is still generated. The reason why you’re not seeing a point cloud is because of the function call SaveVoxelGrid2SurfacePointCloud. By default, the last parameter weight_thresh is set to 1, which tells the function to only visualize voxels with weight values greater than 1 (weight values are determined by the number of frames the voxel was seen). After changing num_frames = 1;, try replacing line 188 - 190 of demo.cu with the following:

SaveVoxelGrid2SurfacePointCloud("tsdf.ply", voxel_grid_dim_x, voxel_grid_dim_y, voxel_grid_dim_z, 
                                  voxel_size, voxel_grid_origin_x, voxel_grid_origin_y, voxel_grid_origin_z,
                                  voxel_grid_TSDF, voxel_grid_weight, 0.2f, 0.0f);

And the code will produce a point cloud visualization of the voxel volume from just one depth frame.

Azpril45 commented 7 years ago

Dear @andyzeng

Thank you for your answer.

In my own project, the depth image is like this: zraw1

I look through the information you told me and finally got the camera intrinsic and extrinsic matrices like this: intrinsic matrix [585 0 320 0 585 240 0 0 1 ] extrinsic matrix [1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1] I changed the function SaveVoxelGrid2SurfacePointCloud as you said and that actually worked. But when I tried to generate the 3d volume data using the image above as input, nothing generated. I thought the reason may be the format of the input data. So I use Matlab to open the depth image and found that the data you provided are uint16 but mine are uint8. I am wondering if the format of the depth image actually causes the generation problem or there are more parameters (like voxel_grid_origin_xyz) I have to change in order to generate my own 3d volume data.

Thank you Sincerely yours Tony

andyzeng commented 7 years ago

The format of the depth image is likely the problem. You will have to modify the function ReadDepth in utils.hpp to load in the correct depth values from your depth image format.

Azpril45 commented 7 years ago

Dear @andyzeng

Thank you for your kind answer. It turned out the problem caused by the format of the depth image. After some preprocessing to the depth image, the 3d voxel data successfully generated. Thank you for all your help. But still some parameters confused me. First is voxel grid origin. The comment here is that this is the location of voxel grid origin in base frame camera coordinates. float voxel_grid_origin_x = -1.5f; // Location of voxel grid origin in base frame camera coordinates float voxel_grid_origin_y = -1.5f; float voxel_grid_origin_z = 0.5f; I tried to change this parameter(like set all of them to 0.0f) and generated a totally drifferet volume data. I'm wondering why you set voxel grid origin like this. Second is trunc_margin. I just want to know why you set this parameter as voxel_size *5 and how it works in the TSDF computation processing. Last is about the voxel_grid_dim. In my opinion, this is the resolution of the TSDF volume. It means that I can generate a TSDF volume data with different resolutions? If I got a depth image like this 1 It is a crop depth image(128x128) from the original depth image. I want to generate a TSDF volume with the resolution 32x32x32. How should I set these parameters?

Thank you Sincerely yours Tony

andyzeng commented 7 years ago

Re voxel grid origin: if you imagine your TSDF volume as a 3D box that is axis-aligned in 3D camera space, the voxel grid origin defines the location of the origin corner of the volume. By setting the voxel grid origin to (0,0,0) in camera coordinates, you're translating the 3D box so that the origin corner lies on the camera location - hence giving you different volumetric data. Moving voxel grid origin will move your 3D box.

Re trunc_margin: a regular distance field would have values from 0 (close to surface) all the way to infinity (far from surface). trunc_margin defines when to cut the distance field (hence the term "truncated") so that you don't integrate distance values too far away from the surface. For more information on volumetric integration, I would recommend taking a look at this.

Re TSDF volume resolution: voxel_grid_dim is the number of voxels that make up your volume. On the other hand, voxel_size determines the size of your voxels. You will need to change both voxel_grid_dim and voxel_size to generate TSDF volume data with different resolutions.

For your hand example, project the depth data of the hand into camera coordinates, and find a reasonable location to define the voxel grid origin. You can change voxel_grid_dim to be 32x32x32, and then change voxel_size so that the voxel volume encompasses the whole hand.

Azpril45 commented 6 years ago

Dear @andyzeng

Thank you for your kindness. In my case, I first segmented the hand as foreground from a depth image(640x480) and cropped the hand region into 128x128. Then, I calculated the center of mass(COM). I want to align this COM to the voxel grid origin and calculate a TSDF with the resolution of 60x60x60 voxels. Each voxel represents a space of 5x5x5 mm, so the whole TSDF expands a space of 300x300x300 mm. I changed the parameters like this: int im_width = 128; int im_height = 128; float voxel_grid_origin_x = -1.5f; float voxel_grid_origin_y = -1.5f; float voxel_grid_origin_z = 0.5f; float voxel_size = 0.005f; float trunc_margin = voxel_size * 10; int voxel_grid_dim_x = 60; int voxel_grid_dim_y = 60; int voxel_grid_dim_z = 60; But it failed. However, if I keep all the parameters the same, just simply changed the voxel grid dim to 500x500x500, the TSDF generated successfully. I don't know why. Does the voxel grid origin cause the problem? So now, I still confused about how to set the voxel grid origin. I have already calculated the center of mass in image coordinate which is (64,64). According to what you said, voxel grid origin defines the location of the origin corner of the volume. Why you set the voxel grid origin to (-1.5, -1.5, 0.5)? It seems like this voxel grid origin can work well if the input depth image is 640x480. According to what you told me last time, I should project the depth data of the hand into camera coordinates. I don't know the meaning of this sentence. Can you explain some details about it?

Thank you Sincerely yours Tony

andyzeng commented 6 years ago

Re camera coordinates: in order to create a 3D point cloud using a depth image (like this piece of code), you typically use the camera intrinsics in order to project the depth values into 3D coordinate space. This 3D coordinate space is called the camera coordinate space.

Now back to the voxel grid origin: keep in mind that the voxel grid origin is not the the center of the voxel volume. If your TSDF voxel grid is 60x60x60 voxels (where voxels coordinates range from (0,0,0) to (60,60,60)), the voxel grid origin is the 3D location of voxel (0,0,0) in camera coordinate space. In other words, it is a "corner" of the voxel grid, not the "middle" of it.

For your problem: yes, you will need to set the right voxel grid origin. After using the camera intrinsics to project the depth data of the hand into camera coordinates, you should have a 3D camera coordinate location for each pixel. Your voxel grid will be a 3D bounding box around the 3D locations of the pixels that represent the hand. Find the smallest 3D location of all of those 3D points, and that will be the location of your desired voxel grid origin.

With that said, I highly recommend searching online for some academic resources that can introduce you to the basic concepts of 3D vision (such as this or this). The computer vision course that I TA'd at Princeton also has some nice introductory content for 3D vision (see course slides here). You will need a good understanding of these topics before understanding what the code in this repository does.