IntelRealSense / realsense-ros

ROS Wrapper for Intel(R) RealSense(TM) Cameras
http://wiki.ros.org/RealSense
Apache License 2.0
2.53k stars 1.75k forks source link

Depth and Color Stream break when both enabled and I need it for rtabmap #2308

Closed Brac24 closed 2 years ago

Brac24 commented 2 years ago

I am on a Jetson Nano and am running ROS 2 Foxy inside a docker container. The image below shows the versions I'm running. realsense_config

I have a similar issue to https://github.com/IntelRealSense/realsense-ros/issues/2149 where the depth and color streams break. When I visualize in rviz it shows that it's struggling to render both video streams on there at the same time. Eventually one of the streams will stop publishing and the other will keep working. This only occurs when both depth and color streams are enabled. I have tried lowering the resolution and the fps to the suggestion in #2149 but it will still eventually stop working after a few seconds.

The main reason I am diagnosing this issue is because these are two important topics needed to get rtabmap working since it is subscribing to these topics. I did a ros2 topic hz on each of the video stream topics (depth and color) and the same thing occurred where one topic would eventually stop publishing and the other topic would keep publishing fine.

The below image shows the depth and color streams displaying in rviz realsense_depth_color_streams

MartyG-RealSense commented 2 years ago

Hi @Brac24 I note that you are performing depth to color alignment as Align Depth is set as On in the log. Depth to color alignment is a processing-intensive function. Do you know whether you have CUDA support enabled in the librealsense SDK, please?

If librealsense is built with CUDA support on your Jetson Nano then it can take advantage of the Nvidia graphics GPU on Jetson boards to significantly improve align performance by offloading processing from the CPU onto the GPU.

If you built librealsense and the ROS wrapper separately on your Jetson, then if librealsense is built with CUDA support then the performance acceleration is automatically applied.

If you built librealsense with Jetson packages then CUDA support is automatically included, whilst when building librealsense from source code with CMake it is enabled by including the CMake build term -DBUILD_WITH_CUDA=true

https://github.com/IntelRealSense/librealsense/blob/master/doc/installation_jetson.md


You could test whether having alignment enabled is a factor in your problem by disabling it and seeing whether the slowdown of one stream still occurs.

Brac24 commented 2 years ago

Thanks @MartyG-RealSense so I tried your suggestion about disabling alignment but the issue still occurs. Trying to visualize both streams has very poor performance and just stops working after a bit but if I remove one of the streams the other comes back and starts working again. It seems I can only visualize one at a time but I assume this issue will also occur once rtabmap ingests both video streams.

I also noticed when using the jtop utility to monitor CPU and GPU usage on the Jetson Nano when I ran the launch file it never uses the GPU. But if I run for example rs-pointcloud I do see some GPU usage. The way I installed librealsense in the Docker container was using the script from this repo https://github.com/JetsonHacksNano/installLibrealsense and just ran the installLibrealsense.sh script. realsense-ros package was already installed as part of the docker image I pulled from here https://hub.docker.com/layers/ros/dustynv/ros/foxy-slam-l4t-r32.6.1/images/sha256-c0d61d653c464c7d08309fd7af57b13ad68c4622da463dd97b8bb2938ef7522e?context=explore .

Another thing to note is that I am running rviz remotely on my PC and running the realsense node on the jetson nano not sure if GPU usage has to do with me rendering on a remote machine instead of the jetson nano.

MartyG-RealSense commented 2 years ago

If the ROS wrapper was already installed when you installed librealsense from the Jetson packages with installLibrealsense.sh then you will likely need to rebuild the ROS wrapper. This is because each time that the librealsense version is updated, the ROS wrapper has to be built again afterwards.

CUDA support should be automatically included in the Jetson packages that installLibrealsense.sh installs.

May I confirm that you are only building librealsense with installLibrealsense.sh and are not then continuing to the following section of the JetsonHacks instructions and running buildLibrealsense.sh (which would build librealsense a second time from source code and potentially set up a conflict in the udev device handling rules).

You could use htop in Ubuntu to see whether the memory capacity of the Jetson is consumed rapidly after launch (a 'memory leak' that causes an application's performance to degrade to the point where the application may freeze).

You could also test whether RViz is the cause of the slowdown, as I have handled some cases where streams perform normally til RViz is opened and then they slowed down greatly. You could do a rostopic hz check to read the publishing rate of the depth and color topics after launch has completed but before you open RViz.

http://wiki.ros.org/rostopic#rostopic_hz

How is your PC communicating with the Nano, please - with wireless wi-fi or ethernet cable, or some other method? Ethernet cable's data transfer is faster than USB 2 but slower than USB 3 (5000 Mbps for USB 3, 1000 Mbps for GigE ethernet).

Only the Nano would be benefitting from CUDA acceleration, whilst the PC would be using its own memory, CPU and GPU resources to run RViz. This could be disadvantageous to performance if the PC only has a low-end hardware specification.

Brac24 commented 2 years ago

Yes I only installed I did not run buildLibrealsense.sh. How do I rebuild ros wrapper? Would I just call colcon build again? I rebuilt it and then did a source on loca_setup.bash and now I get mismatching version. librealsenseDifferentFromRosWrapper

And I am communicating using WiFi. The memory usage seems to stay at 1.8 GB on the jetson nano like it is capped to that or something. Opening rviz itself doesn't degrade performance itself it is only when I try to view both streams that it stops working.

MartyG-RealSense commented 2 years ago

The log's mismatch message suggests that ROS wrapper 3.2.3 was built for SDK 2.48 and then the SDK version was updated to 2.50.0 but the ROS wrapper was not rebuilt afterwards. The mismatch message should disappear once the ROS wrapper is built again.

Yes, build the wrapper again from source with colcon build

Assuming that auto-exposure is enabled, as both depth and color arebeing published at the same time then it would be worth testing whether disabling the RGB option auto_exposure_priority improves performance when both depth and color are active. Disabling the auto_exposure_priority option when auto-exposure is enabled forces a constant FPS rate instead of permitting FPS to vary.

My understanding is that launch file instructions on a 400 Series camera for disabling auto_exposure_priority is the one below:

<rosparam> 
/camera/rgb_camera/auto_exposure_priority: false 
</rosparam>
Brac24 commented 2 years ago

I did rebuild with colcon build but that is what cause the version issue. I think the problem is that the ROS wrapper came with the docker image already installed and in trying to diagnose this issue a pulled the realsense-ros repository into my own ROS workspace. So I have a built ROS wrapper inside the default ROS root in /opt/ros/foxy and another realsense-ros in my ~/dev_ws/src and I wanted to use the one that I pulled from this repository instead of the default wrapper that came with the container. Looking inside /opt/ros/foxy directories I have shared libraries belonging to librealsense 2.48.

How would I explicitly build with librealsense 2.50?

I tried deleting those shared libraries but it doesn't seem to work it still says I'm building the wrapper with 2.48.

MartyG-RealSense commented 2 years ago

If you built from packages using either the JetsonHacks installLibrealsense.sh script or Intel's official Jetson instructions then both methods should build the latest librealsense version, which is currently 2.50.0.

https://github.com/IntelRealSense/librealsense/blob/master/doc/installation_jetson.md#4-install-with-debian-packages https://github.com/JetsonHacksNano/installLibrealsense

If you have previously installed librealsense 2.48.0 from packages then it may be worth removing all librealsense related packages from the computer before installing 2.50.0 from packages. This removal can be performed with the command below.

dpkg -l | grep "realsense" | cut -d " " -f 3 | xargs sudo dpkg --purge

It sounds to me as though the procedure that may suit you is:

  1. Remove all librealsense related packages so you are doing a clean installation without an older librealsense version on your computer to conflict with.
  2. Install the latest librealsense (2.50.0) from packages with installLibrealsense.sh
  3. Install your Docker file to install the Foxy ROS2 wrapper. Since the Docker file that you linked to was updated only a month ago at the time of writing this, I assume that it installs ROS wrapper 3.2.3 (the correct match for librealsense 2.50.0) ?
Brac24 commented 2 years ago

So the docker container already came with precompiled binaries somehow. So running the command will only uninstalls 2.50 but the share libraries for 2.48 still exist and will keep using those to compile the ROS wrapper. These binaries are located under /opt/ros/foxy/install/lib.

So I have 2 separate images. The original one from dustynv and a second one that is copy of the original but I use it to add extra libraries I might want or need. I just ran a test using the original docker image which contains the librealsense precompiled binaries and it was about 5x faster when visualizing the streams. I went from about 5 or 6 fps on visualizing aligned_depth_to_color to about 25 to 30 fps when using the original image. I still got the issue where trying to visualize both color and depth streams at the same time would break until I removed one of the streams then the other would come back working fine. Maybe installing librealsense 2.50 in my other container broke some things.

Sure enough I removed 2.50 from my other container and the performance was way better. 2.50 must have conflicted somehow with 2.48.

So how would I remove 2.48? Would I have to go around looking for these binaries and remove them since they weren't installed with some sort of package manager I think.

MartyG-RealSense commented 2 years ago

When an Ubuntu system's files get in a tangle, it is sometimes easier to get it working correctly by just completely wiping the storage and installing everything fresh from the beginning again, including the OS. This strategy has often been successful for ROS wrapper problems where no other solutions tried had worked.

My knowledge of Docker is limited unfortunately, so I do not know the procedure for cleanly removing your 2.48.0 share libraries.

If you were able to consider using an alternative Docker image for 2.50.0, Intel published a Docker tutorial at the same time that 2.50.0 was released that provides guidance on creating a Dockerfile.

https://github.com/IntelRealSense/librealsense/tree/master/scripts/Docker

It sounds as though the Docker image that you already have does everything that you want it to already though, except for the slowdown.

In regard to looking for file locations, the librealsense installation documentation for building from source code provides the following information:


The shared object will be installed in /usr/local/lib, header files in /usr/local/include.

The binary demos, tutorials and test files will be copied into /usr/local/bin


Thanks very much for your patience!

Brac24 commented 2 years ago

Hey @MartyG-RealSense thanks for the help. So I think I'll stick with 2.48 since that is already working. I guess the issue still persists though when ingesting both streams. When I run rtabmap_ros it subscribes to the aligned_depth_to_color and /color/image_raw and it seems to break when trying to read both topics.

I ran ros2 topic hz for /camera/aligned_depth_to_color/image_raw and /camera/color/image_raw. What happens is that whichever topic I do first it properly prints out the topic frequency but the second topic never prints out a frequency meaning it's not publishing the topic anymore. This is essentially the same test of subscribing to these topics manually without the use of rviz and we come to the same conclusion.

MartyG-RealSense commented 2 years ago

Does launching RTABMAP with compression enabled make a difference if you are not doing so already, as described at https://github.com/introlab/rtabmap_ros/issues/622#issuecomment-911964395

If not then I will be happy to continue this support discussion tomorrow (Sunday). Good luck!

Brac24 commented 2 years ago

Hey @MartyG-RealSense so I have not tried running it with compression enabled. But I still get the issue even just inspecting the publish rate of the topic. Running the command: ros2 topic hz camera/aligned_depth_to_color/image_raw displays the rate of the topic. Then in a separate terminal I run: ros2 topic hz /camera/color/image_raw and displays nothing it seems like it is just waiting to receive a message but it never does.

I assume running ros2 topic hz subscribes to the topic and calculates the rate at which it receives these messages. So somehow whenever I try to subscribe to these 2 topics at once it runs into issues.

Brac24 commented 2 years ago

I just ran some more tests and it seems to be an issue when subscribing to these topics from my remote machine. Running the ros2 topic hz commands on the jetson nano work fine when reading both topic rates at the same time. As soon as I run the hz command on a second topic from my remote machine the publish rate stops on my remote machine but on the jetson nano it still displays but I can see the rate slowly decreasing. Once I kill one of the hz processes on my remote machine the other rate begins displaying and the rates go back up on the jetson nano side.

MartyG-RealSense commented 2 years ago

In Intel's tutorial for using three cameras across two machines (with one camera attached to the remote machine), they take the approach of ssh'ing to the remote computer and then preceding the remote machine's roslaunch instruction with a ROS_MASTER_URI instruction to connect it back to the ROS host IP address of the first computer.

https://github.com/IntelRealSense/realsense-ros/wiki/showcase-of-using-3-cameras-in-2-machines#terminal-4--used-to-operate-the-second-machine-perclnx217-in-my-case

MartyG-RealSense commented 2 years ago

Hi @Brac24 Do you require further assistance with this case, please? Thanks!

Brac24 commented 2 years ago

@MartyG-RealSense It's ok for now I haven't had time to mess with it. Thank you for the help.

MartyG-RealSense commented 2 years ago

Thanks very much @Brac24 for the update. As you are not currently working on the issue, I will close the case. Please feel free to re-open it at a future date or create a new case. Thanks again!

MartyG-RealSense commented 2 years ago

To the customer danialdunson who posted a comment on here abut CUDA that seems to be deleted: if you installed librealsense from the Jetson debian packages but are experiencing very low FPS when align_depth is set to True then this may not be related to CUDA support.

There is a known issue on the RealSense ROS wrapper with Jetson boards specifically where enabling a pointcloud or using align_depth can cause a severe performance reduction that does not occur when pointcloud or align_depth are disabled.

In regard to the pointcloud issue, the best current known solution is at https://github.com/IntelRealSense/realsense-ros/issues/1967#issuecomment-1029789663

The align_depth related issue on Jetson is discussed at https://github.com/IntelRealSense/librealsense/issues/9519

Whilst there is not a clear solution for align_depth FPS issues on Jetson, the problem started being reported around the same time that the pointcloud issue was reported, suggesting that the solution suggested at https://github.com/IntelRealSense/realsense-ros/issues/1967#issuecomment-1029789663 for the pointcloud might be applicable to align_depth too.

danialdunson commented 2 years ago

Hey @MartyG-RealSense, Thank you for guidance. ....I had another question but it was an RTFM error lol.

MartyG-RealSense commented 2 years ago

I just saw your question. As I have started, I will finish. :)

GLSL and CUDA are two separate graphics acceleration systems in the RealSense SDK. CUDA only works with Nvidia GPUs such as those on Nvidia Jetson boards. GLSL works with any brand of GPU, though there may not be a noticable improvement in performance on low-end devices.

GLSL can be used in librealsense SDK applications but is not used in the RealSense ROS wrapper.

If CUDA support has been enabled in the build of librealsense then the ROS wrapper can automatically make use of it. However, if the ROS wrapper and librealsense were installed together from packages with the wrapper's Method 1 installation procedure then CUDA support is not included in the packages. For using ROS with Jetson and CUDA, it therefore would be recommendable to build the ROS wrapper from source code.

danialdunson commented 2 years ago

thank you for supporting the stale branch :)

finally got the realsense-ros wrapper to build and run with NO errors. not a single uvc or watchdog warning. with align_depth:=true and enable_pointcloud:=true!

built with: sdk 2.40.0 from source (method 2 with patch). librealsense 2.2.20 from source (make sure you uninstall any old ros-melodic-libreal*) camera FW: 5.13.00.50 camera model: D415 ROS: Melodic

Device: Jetson Nano 4Gb (B01) Jetpack 4.6 connected to amazon basics powered USB 3.2 hub as recommended in the white papers

hope this helps someone.

MartyG-RealSense commented 2 years ago

Hi @danialdunson It's great to hear that you were successful. Thanks so much for sharing with the RealSense ROS community the configuration that worked for you!

BhooshanDeshpande commented 2 years ago

@MartyG-RealSense : I am subscribing to the aligned depth topic and it appears that the topic is published only when subscribed to. I observed that as my subscriber node gets killed, the publishing stops (as expected), but when restarted, the realsense doesn't start republishing on the topic. Any help on what could be wrong?

MartyG-RealSense commented 2 years ago

Hi @BhooshanDeshpande Aligned topics should be published when align_depth is set to True. Are you setting align_depth to true in the roslaunch instruction or within the launch file, please?

I recall a case where a parameter was set in the roslaunch instruction instead of inside the launch file and when the node was reset, it was using the default roslaunch instruction - such as roslaunch realsense2_camera rs_camera.launch - without the additional parameters that had been in the original roslaunch instruction and so those commands were not being enabled upon reset. I believe that the solution in that case was to define the parameters inside the launch file to ensure that they were carried out after the node was reset.