How to calculate the world coordinates for a pixel point with a depth value of 0?

OldLiu666 commented 11 months ago

After completing the alignment of the depth stream and color stream, we can then obtain the depth data of each pixel, and then use the function rs.rs2_deproject_pixel_to_point(depth_intrin, [i, j], depth_value) to calculate the world coordinates of the pixel. However, for the depth map, there are always some pixels that have a depth value of 0 due to certain reasons, such as reflection, as shown in the black parts of the image below. The world coordinates calculated through the function rs.rs2_deproject_pixel_to_point(depth_intrin, [i, j], depth_value) for these pixels are (0,0,0). I want to know if there is any method to calculate the three-dimensional coordinates of all pixels. I have used hole_filling = rs.hole_filling_filter() on the depth map, but the results are not ideal. Some pixels have a depth value less than one, while some others suddenly become very large, greater than ten, which is not conducive for use.

${0CX3AG~IE8`H7`QTZ _2$R$

MartyG-RealSense commented 10 months ago

Hi @OldLiu666 If your depth problem is primarily due to reflections then applying a physical optical filter prodct called a linear polarization filter over the lenses on the outside of the camera can significantly help to negate glare from reflections.

Any thin-film polarizing filter should work so long as it is linear, so they can be obtained inexpensively. You can search stores such as Amazon for the term for the term linear polarizing filter sheet to research example filter products.

Section 4.4 When to use polarizers and waveplates in Intel's white-paper guide to using optical filters with RealSense 400 Series cameras has more information about this type of filter.

https://dev.intelrealsense.com/docs/optical-filters-for-intel-realsense-depth-cameras-d400#4-the-use-of-optical-filters

The image below from that section demonstrates the before and after effects of a linear polarization filter.

The means of attaching the filter externally to the outside of the camera does not need to be advanced. It could be as simple as wrapping elastic bands round the camera to hold the filter firmly onto the front.

Alternatively, you could try setting the Medium Density camera configuration preset to see if that improves your depth image results, as Medium Density provides a good balance between accuracy and the amount of detail on the depth image.

Python code for setting the Medium Density preset can be found at https://github.com/IntelRealSense/librealsense/issues/10695#issuecomment-1194400710 whilst the link below provides C++ code for setting a named preset.

https://www.intel.com/content/www/us/en/support/articles/000028416/emerging-technologies/intel-realsense-technology.html

OldLiu666 commented 10 months ago

Hello and good day! First of all, I'd like to express my gratitude for your diligent answers. Perhaps I didn't clarify earlier. What I meant was that one of the possible reasons for the depth value of a point in the depth map being 0 could be due to reflection, among other reasons. My main emphasis is on the missing depth data. I mainly use the depth data measured by D435 and then calculate the three-dimensional coordinates of the pixel using the rs.rs2_deproject_pixel_to_point function, which is then combined with the millimeter-wave back-projection algorithm (BP algorithm) for imaging. Therefore, the depth and coordinates of the pixel are crucial. Missing depth information can lead to an inability to calculate the three-dimensional world coordinates, which in turn affects the quality of the imaging. Additionally, I'd like to ask you another two question. Have you come across any methods, blogs, or GitHub repositories where people have combined D435 for millimeter-wave imaging? I've seen a method that uses TSDF for 3D reconstruction. This method allows the object to be divided into a 3D grid and reconstructed, after which the grid coordinates are used for millimeter-wave imaging. The reason is that 3D reconstruction with TSDF can obtain complete coordinate information. However, it requires the integration of many depth maps, and the algorithm takes a long time to compute, but the imaging results are very good. I also have a question about the joint calibration of radar and camera. I've seen in some published papers where they mentioned using a calibration board with six iron balls placed on it. The radar then performs a volumetric imaging, and the position of the iron balls is determined by the maximum values of S21. However, the method for joint calibration of the camera and radar isn't very clear to me. Ultimately, they were able to obtain the transformation relationship from the camera coordinate system to the radar coordinate system: a rotation matrix and a translation matrix. The specific method wasn't elaborated in the papers. I've found many people online using this method for the combined use of cameras and radars, but I haven't found useful information on GitHub. If you have any useful information, please kindly share. Lastly, I sincerely appreciate your answers. Thank you very much！！！

OldLiu666 commented 10 months ago

嗨，如果您的深度问题主要是由于反射引起的，那么在相机外部的镜头上应用称为线性偏振滤光片的物理光学滤光片产品可以显着帮助消除反射产生的眩光。

任何薄膜偏振滤光片只要是线性的，就可以工作，因此可以廉价地获得它们。您可以在亚马逊等商店中搜索术语“线性偏振滤光片”的术语，以研究示例滤光片产品。

第 4.4 节何时使用偏振片和波片英特尔的白皮书《实感 400 系列摄像头光学滤光片使用指南》中提供了有关此类滤光片的更多信息。

https://dev.intelrealsense.com/docs/optical-filters-for-intel-realsense-depth-cameras-d400#4-the-use-of-optical-filters

下图显示了线性偏振滤光片的前后效应。

将滤镜外部连接到相机外部的方法不需要先进。它可以像在相机周围缠绕松紧带一样简单，将滤镜牢固地固定在前面。

或者，您可以尝试设置“中密度相机配置”预设，看看这是否能改善深度图像结果，因为“中密度”在精度和深度图像的细节量之间提供了良好的平衡。

用于设置中等密度预设的 Python 代码可以在 #10695 （评论）中找到，而下面的链接提供了C++用于设置命名预设的代码。

https://www.intel.com/content/www/us/en/support/articles/000028416/emerging-technologies/intel-realsense-technology.html

Hello and good day! First of all, I'd like to express my gratitude for your diligent answers. Perhaps I didn't clarify earlier. What I meant was that one of the possible reasons for the depth value of a point in the depth map being 0 could be due to reflection, among other reasons. My main emphasis is on the missing depth data. I mainly use the depth data measured by D435 and then calculate the three-dimensional coordinates of the pixel using the rs.rs2_deproject_pixel_to_point function, which is then combined with the millimeter-wave back-projection algorithm (BP algorithm) for imaging. Therefore, the depth and coordinates of the pixel are crucial. Missing depth information can lead to an inability to calculate the three-dimensional world coordinates, which in turn affects the quality of the imaging. Additionally, I'd like to ask you another two question. Have you come across any methods, blogs, or GitHub repositories where people have combined D435 for millimeter-wave imaging? I've seen a method that uses TSDF for 3D reconstruction. This method allows the object to be divided into a 3D grid and reconstructed, after which the grid coordinates are used for millimeter-wave imaging. The reason is that 3D reconstruction with TSDF can obtain complete coordinate information. However, it requires the integration of many depth maps, and the algorithm takes a long time to compute, but the imaging results are very good. I also have a question about the joint calibration of radar and camera. I've seen in some published papers where they mentioned using a calibration board with six iron balls placed on it. The radar then performs a volumetric imaging, and the position of the iron balls is determined by the maximum values of S21. However, the method for joint calibration of the camera and radar isn't very clear to me. Ultimately, they were able to obtain the transformation relationship from the camera coordinate system to the radar coordinate system: a rotation matrix and a translation matrix. The specific method wasn't elaborated in the papers. I've found many people online using this method for the combined use of cameras and radars, but I haven't found useful information on GitHub. If you have any useful information, please kindly share. Lastly, I sincerely appreciate your answers. Thank you very much！！！

MartyG-RealSense commented 10 months ago

You are very welcome!

Please let me know whether using the Medium Density Visual Preset suggested above improves the depth image.

There was a RealSense-equipped wheelchair at the link below that combined RealSense with a millimeter-wave radar sensor.

https://www.intelrealsense.com/smart-wheelchairs-with-luci/

A RealSense C++ tool that sounds similar to the map fusion involved in millimeter-wave is the rs-kinfu project. It uses a single RealSense camera to progressively build up a pointcloud over time via frame fusion as the camera is moved around.

https://github.com/IntelRealSense/librealsense/tree/master/wrappers/opencv/kinfu

I have not heard of the iron-ball calibration method and do not have RealSense-related information about it, unfortunately.

OldLiu666 commented 10 months ago

You are very welcome!

Please let me know whether using the Medium Density Visual Preset suggested above improves the depth image.

There was a RealSense-equipped wheelchair at the link below that combined RealSense with a millimeter-wave radar sensor.

https://www.intelrealsense.com/smart-wheelchairs-with-luci/

A RealSense C++ tool that sounds similar to the map fusion involved in millimeter-wave is the rs-kinfu project. It uses a single RealSense camera to progressively build up a pointcloud over time via frame fusion as the camera is moved around.

https://github.com/IntelRealSense/librealsense/tree/master/wrappers/opencv/kinfu

I have not heard of the iron-ball calibration method and do not have RealSense-related information about it, unfortunately.

ok，thank you very much!!!

MartyG-RealSense commented 10 months ago

Hi @OldLiu666 Do you require further assistance with this case, please? Thanks!

OldLiu666 commented 10 months ago

Hi @OldLiu666 Do you require further assistance with this case, please? Thanks!

Hello again, sorry to bother you. I'm looking to perform 3D reconstruction using RGB-D images captured with the D435 and combined with the TSDF (Truncated Signed Distance Function). I've found that there's limited code available on GitHub for this purpose. Do you know how to capture photos with the D435 and then proceed with TSDF-based 3D reconstruction? Or perhaps, capture a series of depth images first and then perform the reconstruction?

MartyG-RealSense commented 10 months ago

If might be possible to use your D435 with the PCL-based Kinfu Large Scale project at the link below (not the RealSense example rs-kinfu) to achieve your goal by converting the TSDF data into a mesh.

https://github.com/PointCloudLibrary/pcl/blob/master/doc/tutorials/content/using_kinfu_large_scale.rst#part-2-running-pcl_kinfu_largescale_mesh_output-to-convert-the-tsdf-cloud-into-a-mesh

OldLiu666 commented 10 months ago

If might be possible to use your D435 with the PCL-based Kinfu Large Scale project at the link below (not the RealSense example rs-kinfu) to achieve your goal by converting the TSDF data into a mesh.

https://github.com/PointCloudLibrary/pcl/blob/master/doc/tutorials/content/using_kinfu_large_scale.rst#part-2-running-pcl_kinfu_largescale_mesh_output-to-convert-the-tsdf-cloud-into-a-mesh

hello, thanks again! The challenge I'm facing is not converting the TSDF data into a mesh. Instead, it's about generating a TSDF grid using RGB-D images captured with the D435 and integrating it with the Open3D library. Once that's accomplished, I plan to proceed with tasks using the TSDF.

MartyG-RealSense commented 10 months ago

I am not personally familiar with TSDF grids, but my research of your question suggests that a PDF document of a research paper at the link below may be a useful reference. The paper is titled 3D Scene Reconstruction from RGB-Depth Images.

https://fkong7.github.io/data/CS284_Final_paper.pdf

Section 1.2 - Mesh Reconstruction - on page 2 of the paper discusses using a RealSense camera (which appears to be a D435 or D435i) to do the following.

We applied the TSDF algorithm to reconstruct a 3D voxel grid from multiple input frames. With the estimated camera poses, we can compute the world coordinates of each pixel in the RGB-D images and thus align RGB-D images from different frames into a consistent space ... We integrated the scalable implementation of TSDF fusion in Open3D into our framework. This implementation constructed a sparse representation of TSDF volume by using unordered map to build a hierarchical hashing table that only stores voxels with distance values 𝑑𝑡 between 1. and -1.

OldLiu666 commented 10 months ago

I am not personally familiar with TSDF grids, but my research of your question suggests that a PDF document of a research paper at the link below may be a useful reference. The paper is titled 3D Scene Reconstruction from RGB-Depth Images.

https://fkong7.github.io/data/CS284_Final_paper.pdf

Section 1.2 - Mesh Reconstruction - on page 2 of the paper discusses using a RealSense camera (which appears to be a D435 or D435i) to do the following.

We applied the TSDF algorithm to reconstruct a 3D voxel grid from multiple input frames. With the estimated camera poses, we can compute the world coordinates of each pixel in the RGB-D images and thus align RGB-D images from different frames into a consistent space ... We integrated the scalable implementation of TSDF fusion in Open3D into our framework. This implementation constructed a sparse representation of TSDF volume by using unordered map to build a hierarchical hashing table that only stores voxels with distance values 𝑑𝑡 between 1. and -1.

I sincerely thank you once again for your assistance. I took a brief look at the article, and it involves TSDF 3D reconstruction using RGB-D images captured with the Intel RealSense camera, which aligns with what I'm looking for. I will read the article in detail, hoping it proves beneficial. Lastly, I genuinely appreciate your tireless efforts in addressing my queries and providing solutions. Thank you.Wishing you success in your work and a joyful life.

MartyG-RealSense commented 10 months ago

You are very welcome. I'm pleased that I could be of help. Thanks for your kind words!

OldLiu666 commented 10 months ago

You are very welcome. I'm pleased that I could be of help. Thanks for your kind words! Hello, I'm sorry to bother you again. When I use the D435 camera to capture RGB-D images and perform TSDF 3D reconstruction with the Open3D library in Python, the function requires not only the RGB-D images but also the camera's pose. How can I obtain the camera's pose when taking pictures with the D435?

MartyG-RealSense commented 10 months ago

Hi @OldLiu666 The D435 camera is not equipped with an IMU component to detect the angle that the camera is oriented at.

RealSense 400 Series cameras also cannot provide 'pose stream' data (the position and angle of the camera) as that feature is only on the RealSense T265 Tracking Camera model. It may be possible to establish a pose for D435 by mounting both a D435 and a T265 on a bracket and moving the bracket, since the T265's orientation will also be the D435's orientation.

Another approach for RealSense cameras without an IMU is to establish the camera's angle using a plane-fit algorithm, as described at the link below.

https://support.intelrealsense.com/hc/en-us/community/posts/360050894154/comments/360013322694

OldLiu666 commented 10 months ago

Hi @OldLiu666 The D435 camera is not equipped with an IMU component to detect the angle that the camera is oriented at.

RealSense 400 Series cameras also cannot provide 'pose stream' data (the position and angle of the camera) as that feature is only on the RealSense T265 Tracking Camera model. It may be possible to establish a pose for D435 by mounting both a D435 and a T265 on a bracket and moving the bracket, since the T265's orientation will also be the D435's orientation.

Another approach for RealSense cameras without an IMU is to establish the camera's angle using a plane-fit algorithm, as described at the link below.

https://support.intelrealsense.com/hc/en-us/community/posts/360050894154/comments/360013322694

I apologize for not providing the complete information; my camera is the D435i. But as you mentioned, "RealSense 400 Series cameras also cannot provide 'pose stream' data (the position and angle of the camera)." Does this mean I have to find a T265 or use another algorithm, such as the plane-fit algorithm, to determine the camera's pose?

MartyG-RealSense commented 10 months ago

It is possible to obtain the camera tilt / roll angle from a D435i. The camera does not know its own position though. Only a T265 can provide that information with its pose stream.

https://dev.intelrealsense.com/docs/rs-pose

Below are examples of D435i-compatible IMU programs.

C++ https://github.com/IntelRealSense/librealsense/tree/master/examples/motion

Python https://github.com/IntelRealSense/librealsense/issues/4391

OldLiu666 commented 10 months ago

可以从D435i获得相机倾斜/滚动角度。不过，相机不知道自己的位置。只有 T265 才能通过其姿势流提供该信息。

以下是兼容 D435i 的 IMU 程序的示例。

C++https://github.com/IntelRealSense/librealsense/tree/master/examples/motion

蟒蛇 #4391

Thank you for your efforts in providing the solution. I wish you a pleasant life.

MartyG-RealSense commented 10 months ago

Case closed due to solution achieved and no further comments received.

IntelRealSense / librealsense

How to calculate the world coordinates for a pixel point with a depth value of 0? #12304