Color conversion in CUDA re-allocates memery at every single frame

IntelRealSense / librealsense

Intel® RealSense™ SDK

https://www.intelrealsense.com/

Apache License 2.0

7.45k stars 4.8k forks source link

Color conversion in CUDA re-allocates memery at every single frame #11692

Open attiladoor opened 1 year ago

attiladoor commented 1 year ago

Required Info
Operating System & Version	Linux (Ubuntu 20)
Kernel Version (Linux Only)	(4.9.253)
Platform	NVIDIA Jetson Nano
SDK Version	2.53
Language	{C++ }
Segment	{Robot/Smartphone/VR/AR/others }

Issue Description

I used Nvidia Nsight Systems to look into my applications performance on a Jetson Nano (using cuda) and i noticed, quite a long time is spent on re-allocating device memory at every single frame.

It is clear from the code that the device memory allocation is called several times around the kernel https://github.com/IntelRealSense/librealsense/blob/master/src/cuda/cuda-conversion.cu#L237

Would it be possible to re-use the device memory buffer between frames?

MartyG-RealSense commented 1 year ago

Hi @attiladoor https://github.com/IntelRealSense/librealsense/issues/7824 may have some relevance to your question.

The RealSense user in that case wondered whether they could use a feature called zero-copy to avoid copying that consumes memory. They were advised by a RealSense team member that this feature was no longer functional in the librealsense SDK. However, the RealSense ROS2 wrapper supports intra-process zero-copy.

attiladoor commented 1 year ago

I took a look at the ROS wrapper repo https://github.com/IntelRealSense/realsense-ros as you mentioned but unfortunately, i couldn't find any trace of this being implemented for inspiration.

Here they mention the zero-copy strategy https://docs.ros.org/en/foxy/Tutorials/Demos/Intra-Process-Communication.html but i had the impression, they mean for internal ROS processes rather than anything related to the camera, but i am not familiar with ROS, so i might be wrong.

A little bit tangent question: In the mentioned issue https://github.com/IntelRealSense/librealsense/issues/7824 he manages to copy the frame data into a device memory buffer directly. Do you think that can work so easily in C++ as well if i copy the frame data pointer to a device buffer?

MartyG-RealSense commented 1 year ago

I do not know the answer to your tangent question, unfortunately. The SDK documentation link below provides C++ guidance about frame memory management though.

https://dev.intelrealsense.com/docs/frame-management#frame-memory-management

attiladoor commented 1 year ago

Unfortunately i couldn't make it not re-allocating the memory through the librealsense API so i went to the code HERE and made the buffer pointers static and not re-allocate once they are set and that did the trick, however i did gain much speed by that.

Anyway, if you would like it, then i can create a pull request with creating a minimal buffer class that would allow to grow size, (so in case multiple streams with multiple size would use it, it would not break.) otherwise we can close this issue. Thanks for your help

MartyG-RealSense commented 1 year ago

You are very welcome. Please do create a pull request. I will add an Enhancement label to keep this issue open, as it will be associated with the pull.

attiladoor commented 1 year ago

I am a bit busy with other things, so It will take a few days for me, but i will back soon

MartyG-RealSense commented 1 year ago

That's no problem at all. Thanks very much for the update!

dmipx commented 1 year ago

Hi @attiladoor As far as i know, we do not use color conversion for generic formats. I know that for RGB, default color for SDK is RGB8. In case you will choose YUYV, the format camera streaming, it should not use any color processing. Can you verify with Depth/RGB@YUYV streaming and confirm GPU occupancy?

attiladoor commented 1 year ago

Hi @dmipx

Do you mean that i might enabled wrong camera streams in the SDK? I re-used the code from THIS example and called it like:

device_with_streams({RS2_STREAM_COLOR, RS2_STREAM_DEPTH}, serial_);