Convert ROI from color coordinates to depth (for exposure)

TheophileBlard commented 4 years ago

Required Info
Camera Model	D435
Operating System & Version	Ubuntu 18.04
Platform	PC
SDK Version	2
Language	C++

Issue Description

I'm working on a project where accurate depth measurement is critical, for small objects that can move around the screen. In order to achieve that, I'm trying to automatically update the depth exposure ROI, based the bounding boxes of the objects, which are detected on the color frame. When a object is close the center of the screen, I update the depth exposure ROI with its bounding box. Both frames have the same size (848x840), and the depth frame is aligned to the color frame.

This idea came to me while experimenting in the realsense-viewer, and I get very good results when I manually set the ROI around the objects. However, when I try to do it programmatically, (as explained above), I get bad results. I'm pretty sure it's because of the wider FOV of the depth sensor, hence my question: how to adapt a color ROI to its corresponding depth frame ?

I am aware of the rs2_project_color_pixel_to_depth_pixel, but I it is not very suitable to my use case, as I really don't care about the validity of the depth pixels. Its also difficult to use because I have only access a decimated (x2) & aligned version of the depth frame.

MartyG-RealSense commented 4 years ago

Hi @TheophileBlard Before looking into your question about ROI, I thought that it would be useful to go through some possibilities for an alternative approach with you first.

In the discussion in the link below, the subject of using a D435 for the purpose of tracking a small, fast moving object (a bee) is looked at. In this conversation, the Chief Technical Officer of the RealSense Group at Intel (agrunnet) offers advice about how to improve tracking results.

https://github.com/IntelRealSense/librealsense/issues/4175#issuecomment-507448389

agrunnet mentions using 90 FPS, but also references a much faster FPS mode. That high speed capture mode is now available for use with the D435 and offers 300 FPS without loss of performance. The trade-off to achieve this performance is a vertically smaller viewpoint centered on the middle of the screen.

If your application was going to be focused on the center of the screen anyway with the ROI approach, it may be worth considering. Intel have published a white-paper document on the high speed capture mode.

https://dev.intelrealsense.com/docs/high-speed-capture-mode-of-intel-realsense-depth-camera-d435

TheophileBlard commented 4 years ago

Thanks for your quick response! I wasn't aware of the high-speed capture mode, and it's definitely interesting. However I'm scared it doesn't fit my use case.

For my application, I need color frames, not only depth. And because of the real-time processing I do (on both color & depth frames), I cannot exceed 30 FPS. Is there a way to activate high-speed capture mode on depth, with color stream activated, and while staying at 30 FPS effectively ?

Moreover, even if I focus on the center of the screen, I would like to have a depth value for all color pixels. My objects are not moving fast, but they have a large range of color, and the background is black. The ROI system would be way to avoid overexposure (hence, bad depth) when most of the ROI content is the background (if it is fixed).

Feel free to offer me a better idea!

MartyG-RealSense commented 4 years ago

Technically the D435's RGB stream is capable of 60 FPS at 848x480 (it is 1280x720 that is limited to 30 FPS). So you could reach 60 FPS for both depth and RGB. When the RGB is running at 60 FPS, it should help reduce blurring during the capture of motion that could happen when RGB is set to 30 FPS.

The reason that blur can occur at 30 FPS is because the RGB sensor on the D415 and D435 has a slow 'rolling shutter', whilst the depth has a fast 'global shutter' capable of capturing a vehicle at full speed.

If you are performing alignment between depth and color and are concerned that 60 FPS may be too processing intensive (alignment is a processing-heavy task), it is possible to reduce the burden on the CPU by making use of CUDA support in Librealsense. On computers equipped with an Nvidia graphics chip, some of the CPU's processing work during alignment can be offloaded onto the GPU graphics chip, noticeably reducing the CPU's percentage usage.

On the new RealSense D455 model, improvements have been made that make depth to color alignment easier. Both RGB and depth sensors have a fast global shutter, both have the same size FOV and are both mounted on the same stiffener.

https://github.com/IntelRealSense/librealsense/issues/6610

Even if alignment is not being done, the RGB image on its own should benefit from use of 60 FPS in regard to motion blur reduction. I would say though that if your object is moving at a speed less than human walking pace then you may not experience blur on the image anyway even with the RGB's slower shutter on the D435, and so using 60 FPS may not change the image enough to be worth the extra processing and 30 FPS may indeed then be a suitable speed for this application.

MartyG-RealSense commented 4 years ago

If the camera is in a fixed position and the black background is unnecessary to the depth image, it could be excluded from both the depth and color detail.

Aligning depth to color - like in the align-advanced example program - should remove RGB background detail along with depth detail when the observable distance is reduced.

https://dev.intelrealsense.com/docs/rs-align-advanced

Regarding bad depth from a black background: a general physics principle for depth cameras (not just RealSense) is that dark grey and black colors absorb light. This makes it hard for the camera to read depth detail from such surfaces unless they have a strong light-source on them. Black surfaces may therefore appear to be rendered on the image but are actually plain black empty areas with no depth detail. The darker the shade, the harder it is for light to be reflected.

An example is depth-sensing a black cable. It may appear on the image to be rendered as a black cable but is actually a cable-shaped empty area.

Another example that I recently saw in a case was a black office chair that was rendered on the image as a black chair-shaped object with some visible blue patches of depth on it. The areas that the blue patches were in corresponded to where a nearby strong light source was casting illumination upon the chair on the matching RGB image, lightening those areas of the chair.

TheophileBlard commented 4 years ago

Actually, we already tested 60 FPS, and indeed we somewhat observed better depth, but it wasn't a sufficient improvement.

I'm not at the office atm, or I would have uploaded some example images for you to understand better the problem. As I said, we have objects of all kinds of size which move around a black background, and, thereby, the objects are often overexposed (with both depth & color sensors) with the default exposure ROI. This is not an issue for the color frames, but on the depth frames there are a lot of depth points missing, which is not satisfying. The objects are placed on the background (as it would be on a conveyor), and we measure their thickness. They are not moving fast at all (maybe 5cm/s max).

Regarding speed, we're already using CUDA or OpenMP for alignement, depending on the target device, and it's pretty fast! The "slowness" comes from the segmentation algorithms we use to selectively generate pointclouds of the detected objects. Finally, the D455 is something we're looking into, but right now we're focusing on the D435 (we're using the FRAMOS industrial D435e). Thank you for all the pointers.

We do have good depth for the black background, which I think is thanks to the IR emitter. The issue really comes from the objects. Maybe automatically updating the ROI is not the right way to solve it ? I think it is worth the try :)

MartyG-RealSense commented 4 years ago

Thank you very much for your patience. :)

Intel cannot provide specific technical guidance about the D435e as it is a FRAMOS product and not an Intel one, so FRAMOS are responsible for its tech support. The D435e also has hardware features that the official D435 lacks. FRAMOS may therefore be the best source of information for your project, as they will be best familiar with their product and likely to be familiar with any differences in performance that may arise from their product's design.

I would say though that - on the official D435, at least - if you have a lot of missing depth points (i.e a "sparse" depth image), then sparseness can be reduced by increasing the value of the Laser Power setting. So if there are big round holes on the image, these may start to close up as Laser Power is progressively increased. A side-effect of increasing Laser Power is that the IR dot pattern becomes more visible to the camera.

dorodnic commented 4 years ago

Hi @TheophileBlard I second @MartyG-RealSense point, and would like to emphasise that rs2::align is the recommended way to transfer ROI between color and depth frames. rs2_project_color_pixel_to_depth_pixel can be used in some corner cases, but it has several assumptions.

TheophileBlard commented 4 years ago

@MartyG-RealSense My original question is not hardware related, but is about the Realsense SDK (this repo). We already played with the Laser Power, but didn't see any significant change in depth accuracy above the recommended 150mW.

As I'm back at the office, I managed to take some screenshots to illustrate the problem. On the first one, the depth ROI is set to a "default" value, covering most of the screen. As you can see, we get very good depth for the table, but bad depth for the objects.

Screenshot from 2020-07-13 09-38-14

On the second image, the ROI is reduced to the central object. The depth for the table is a little worse, but we get a better depth representation of the central object (in fact, for the other objects too).

Screenshot from 2020-07-13 09-37-50

@dorodnic we are already aligning frames with rs2::align, depth to color. We therefore have ROI coordinate which are valid for the color sensor, but they can't be used as such with the depth sensor, as it has a larger FOV than the color sensor. As you can see on the image below, it is obvious that the pixel coordinate are different between color & depth. The set_region_of_interest method does not care if frames are aligned, it only expects coordinates which are valid for the original depth sensor frame. Would you have some pointers for me to automatically compute the conversion between pixels on the color frame to pixels in the depth frame (and therefore convert my ROI coordinates from color to depth) ?

Screenshot from 2020-07-13 09-40-30

dorodnic commented 4 years ago

It seems like you want to align color to depth. Then everything will be in depth sensor coordinates that you pass to set ROI.

TheophileBlard commented 4 years ago

It seems like you want to align color to depth. Then everything will be in depth sensor coordinates that you pass to set ROI.

When aligning color to depth, the color frames have black artifacts where there is no depth data available (see below image). This wouldn't work well with the image processing algorithms we're using on the color frames to actually find the ROI. That's why we align depth to color.

Is there a way to get the color image without the artifacts ?

Screenshot from 2020-07-15 14-56-57

dorodnic commented 4 years ago

If you configure the sensor to output RGBA and then align, you should get an alpha mask for the valid pixels. This is not exactly what you want, but it can help. The reality is, if we don't know the depth, any value for that colour pixel would be speculation. Perhaps using hole-filling filter on the depth prior to alignment can help reduce the number of black pixels.

TheophileBlard commented 4 years ago

Thanks for the tips. I ended up using a modified version of rs2_project_color_pixel_to_depth_pixel which works with depth decimation to solve my original issue.

MartyG-RealSense commented 4 years ago

@TheophileBlard Thank you very much for the update! I'm pleased that you were able to find a solution that worked with your D435e.

AlfaLegion commented 3 years ago

To get a clean image and correct depth map values after alignment. Need to use RS2_STREAM_COLOR instead of RS2_STREAM_DEPTH . I did this:

// ....
auto profile = pipe.start(cfg);
rs2_stream align_to = RS2_STREAM_COLOR;
rs2::align align(align_to);
// other settings
// get intrinsics stream color !!!!!
auto stream = profile.get_stream(RS2_STREAM_COLOR).as<rs2::video_stream_profile>();
auto intrin = stream.get_intrinsics(); 

while(true)
{
                       rs2::frameset data = pipe.wait_for_frames();
            rs2::frameset aligned_frames = align.process(data);

            auto color = aligned_frames.first(align_to);
            auto depth = aligned_frames.get_depth_frame();
                        // processing....
}

MartyG-RealSense commented 3 years ago

Thanks very much @AlfaLegion for sharing your solution with the RealSense community :)

IntelRealSense / librealsense

Convert ROI from color coordinates to depth (for exposure) #6803

Issue Description