High CPU usage for Pointclouds on ARM

cwitting commented 9 months ago

Required Info
Camera Model	{ D455}
Firmware Version	05.15.01.00
Operating System & Version	Ubuntu 16 ARM
Kernel Version (Linux Only)	4.9.140
Platform	Jetson TX2 + realsense-ros
SDK Version	v2.50.0
Language	{ C++ }
Segment	{Robot }

Issue Description

I see 100 % CPU for one core when running with pointcloud enabled on Jetson TX2. By profiling it seems that almost all of the time is spent in "rs2_deproject_pixel_to_point2". Looking at that it looks very inefficient when decoding using the "BROWN_CONRADY" model, it needs to loop 10 times for each pixel, is that really necessary? I tried to disable the distortion handling by just doing

point[0] = depth * x; point[1] = depth * y; point[2] = depth; return; inside rs2_deproject_pixel_to_point2 and the CPU usage dropped to < 10%.

Doing this i was not able to notice any difference in the quality of the pointcloud.

I noticed there was handling of this using SSE and CUDA, but I am not able to use either in my current system.

What is it even trying to undistort here, is the depth information from the camera not already undistorted?

MartyG-RealSense commented 9 months ago

Hi @cwitting Thanks very much for sharing the details of your code edit that significantly reduced CPU usage on your Jetson TX2.

As you noted, RealSense applies a distortion model to its streams. Most of the streams, including depth, have distortion applied.

If the SDK's 'rs-enumerate-devices' tool is launched in calibration information mode with the command rs-enumerate-devices -c then you can view which form of Brown-Conrady distortion is applied to each stream profile.

When using pointclouds in Realense ROS on an Nvidia Jetson board, the ideal course of action is to build the SDK from source code with the flag -DBUILD_WITH_CUDA=ON included in order to enable the SDK's CUDA support. When CUDA is enabled in librealsense it is also automatically enabled in the ROS wrapper and accelerates pointcloud and depth-color alignment by offloading their processing from the CPU onto the Jetson's Nvidia graphics GPU. I note though your mention that you are unable to use CUDA on your TX2.

I see that you are using firmware driver version 5.15.1.0 with SDK 2.50.0. Doing so can result in errors as 5.15.1.0 is designed for use with SDK 2.54.2. The recommended firmware to use with 2.50.0 is 5.13.0.50.

I do not have information to provide about the internal operating principles of distortion, unfortunately.

cwitting commented 9 months ago

Hi Marty Thanks for your comments, looking at the output from "rs-enumerate-devices -c" I can see that the coefficients of the Brown Conrady model for the depth are just 0:

 Intrinsic of "Depth" / 1280x720 / {Z16}
  Width:        1280
  Height:       720
  PPX:          643.4638671875
  PPY:          365.228332519531
  Fx:           650.360412597656
  Fy:           650.360412597656
  Distortion:   Brown Conrady
  Coeffs:       0   0   0   0   0  
  FOV (deg):    89.08 x 57.93

In that case it makes even less sense to me to apply the distortion model, especially when there is such a big diffference in performance. Why is it still applied?

I am aware that CUDA is recommended, and it is something we are working towards, but right now I am stuck without it. I will try to downgrade the firmware and see if there are any changes.

cwitting commented 9 months ago

Nothing changes after downgrading the firmware. Should the distortion model not have been set to "NONE" in the case of 0 coefficients? Can I change which distortion model is used for the camera?

I have checked for multiple cameras, all have 0 coefficients for the depth (and infrared streams)

MartyG-RealSense commented 9 months ago

The reasons for coeefficients being artificially set to 0 are provided at https://github.com/IntelRealSense/librealsense/issues/1430#issuecomment-375945916

Whilst original RealSense 400 Series models such as D415 and D435 set all coefficients to 0 for all stream types, more recent 400 Series models such as D455 have non-zero coefficients on the RGB color stream.

The distortion model used by the streams cannot be changed. OpenCV has undistort capabilities though if you wish to investigate that possibility, as the SDK does not have an undistortion feature.

https://docs.opencv.org/4.5.2/dc/dbb/tutorial_py_calibration.html

A RealSense user at https://github.com/IntelRealSense/librealsense/issues/3880 tried an OpenCV undistort of a RealSense image but found that it made little difference.

cwitting commented 9 months ago

I am not interested in in undistorting the images, I am interested in improving CPU performance. I am just wondering why the depth image is run through the undistort pipeline in "rs2_deproject_pixel_to_point" when there are no distortion coefficients. Especially when not doing so increases performance by a factor of 10(!).

Changing the distortion model to "NONE" in the case where there are 0 coefficients would achieve that as then these expensive inner loops for the Brown Conrady model are skipped (and they do not do anything anyway as the coefficients are 0)

MartyG-RealSense commented 9 months ago

The information that you requested is not available to provide, unfortunately. I do apologize.

cwitting commented 9 months ago

Why not? I believe this could just be a bug? Even doing something like:

    if (intrin->coeffs[0] == 0 && 
        intrin->coeffs[1] == 0 &&
        intrin->coeffs[2] == 0 &&
        intrin->coeffs[3] == 0 &&
        intrin->coeffs[4] == 0)
    {
        point[0] = depth * x;
        point[1] = depth * y;
        point[2] = depth;
        return;
    }

Would be better.

MartyG-RealSense commented 9 months ago

I will refer your question to my Intel RealSense colleagues. Thanks very much for your patience.

cwitting commented 9 months ago

Thank you for your help!

IntelRealSense / librealsense

High CPU usage for Pointclouds on ARM #12660

Issue Description