IntelRealSense / realsense-ros

ROS Wrapper for Intel(R) RealSense(TM) Cameras
http://wiki.ros.org/RealSense
Apache License 2.0
2.6k stars 1.76k forks source link

D435i not working properly when connected through ssh #2052

Closed martinakos closed 3 years ago

martinakos commented 3 years ago

I have a launch file to start the D435i realsense-ros driver using a stereo vio configuration (infrared and imu topics only) and disabling the ir projector. The launch file works fine when running on a NUC console (connected to a screen and keyboard) and I can record a rosbag for the infrared and imu topics. However, when I connect to the NUC through an ssh session and launch with the the same launch file, I can see all the topics being published but if I record a rosbag for the infrared camera topics 90% of the time it wouldn't record anything. The other 10% it may have recorded a few messages but not the number of messages if should have recorded in 10s and camera configured at 30Hz. The imu topic seems unaffected and records the correct number of messages.

While running the realsense-ros driver there are a few error messages which may or may not be related to the problem.

When I'm connected to the NUC console there is a few error messages like these:

26/08 13:40:31,376 WARNING [139969511266048] (messenger-libusb.cpp:42) control_transfer returned error, index: 768, error: No data available, number: 61

However, I can record the rosbag fine. I've seen in #1957 that as long as there isn't too many of these errors I can ignore them.

When I connect to the NUC through ssh there are still the previous errors (control_transfer) but now a lot more often and a couple of new errors:

26/08 13:40:32,864 ERROR [139969135765248] (uvc-streamer.cpp:106) uvc streamer watchdog triggered on endpoint: 131 and a bit less frequently:

26/08 13:42:15,381 ERROR [139969106491136] (ds5-options.cpp:88) Asic Temperature value is not valid!

I don't get these last two errors neither when I connect to the NUC console or if I try the same launch in my laptop.

When I connect to the NUC through ssh I can't record the rosbag I need, but I can't connect a screen and keyboard for my experiment and it's in a mobile robot. What can I do?

I'm using ros melodic, ubuntu 18.04, kernel 5.4.0-81-generic, Realsense ROS v2.3.0 built with LibRealSense v2.45.0 firmware version 05.12.08.200

MartyG-RealSense commented 3 years ago

Hi @martinakos As a starting point in investigating your ssh lag problem, you could try disabling X11 forwarding on the ssh connection.

https://www.simplified.guide/ssh/enable-x11-forwarding https://www.ibm.com/support/pages/how-do-i-speed-ssh-disable-ssh-x11-forwarding

martinakos commented 3 years ago

Thanks for the reply. I've disabled ssh X11 forwarding in the NUC and tried again, no change. I still can't record a rosbag properly. Only the imu messages and none image messages are recorded.

Notice, I've been testing this as described above but also using tmux and detaching to rule out wifi coverage problems during the experiment. I would connect with ssh to the NUC, launch tmux and in one pane I run dmesg -w to monitor if there was any camera disconnection during the experiment, in another pane I would launch the camera driver, and in another pane I would record a rosbag with the camera topics and imu. Then I would detach from tmux and exit the ssh session. I would drive my robot through the experiment, for a couple of minutes, then ssh back to the NUC and reattach to the tmux session. Finally, stop the rosbag recording. When I check the contents of the rosbag I see it has only recorded the imu messages but none image messages.

MartyG-RealSense commented 3 years ago

Are images recorded if you don't record the IMU messages?

martinakos commented 3 years ago

I've just tried that and most of the time it didn't record anything, and a couple of times it recorded 17 and 7 messages for a 10s recording. There should have been 300 messages as the cameras are configured for 30Hz.

If I run: rostopic hz /d435i/imu /d435i/infra1/image_rect_raw /d435i/infra2/image_rect_raw image Note I've run this just after restarting the ros realsense driver, and for a brief moment I get the correct hz for the image topics but it then dies out. If I repeat the rostopic hz after the ros realsense driver has been running for a while I only get hz in the imu topic and no messages in the camera topics, which is consistent with what I'm observing when recording rosbags.

MartyG-RealSense commented 3 years ago

If you are streaming the infrared channels at 30Hz / 30 frames per second (FPS) then have you tried a lower FPS speed for them such as '15' in order to test whether the SSH connection is being over-burdened by the amount of data being transmitted at 30 FPS over a non-cabled wireless connection?

MartyG-RealSense commented 3 years ago

As an example of the problems with communication speed that can be encountered with networking in ROS, there was a RealSense ROS case this week at https://github.com/IntelRealSense/realsense-ros/issues/2039 with a wireless LAN (WLAN) that was experiencing very high delay with communicating with a mobile scale-model car. The delay started at zero and grew over time.

That RealSense ROS user's particular problem was resolved by reinstalling the RealSense ROS camera package.

martinakos commented 3 years ago

As I explained in a previous comment, I've been testing this with tmux to rule out network problems with ssh. So I ssh to the NUC and then launch a tmux session inside the ssh session. From tmux I launch the ros camera driver and recording, then I detach from tmux and close the ssh session. So there is no network involved. I've been testing in this way for a while and I keep experiencing the problems I describe in the original message. I just didn't explain all the tmux part in the original post because I thought the problem was related with the error messages I describe (given that when I connect to the NUC console things work fine and I don't see these error messages.)

MartyG-RealSense commented 3 years ago

I do not have knowledge of tmux, so I cannot advise on that particular aspect of your setup.

Some people who have experienced very slow WLAN performance have disabled wi-fi power saving in Ubuntu using the command below.

iw dev wlan0 set power_save off

To restore wi-fi power saving, the following command can be used:

iw dev wlan0 get power_save

martinakos commented 3 years ago

I use tmux to launch the realsense camera driver and the rosbag recording. When I detach from tmux the processes I launched keep running in the background . So I can close the terminal, and even disconnect the NUC from the network and everything I launched keeps working (in the background) as if I had launched it from a terminal. Then after a while, when I want, I reconnect the network, I reconnect to tmux (through ssh, but notice I didn't need ssh while the data was being collected) and everything is running as I left it. However, the rosbag didn't record all the image messages and I observe the errors I mentioned in the realsense ros driver output. If I had to guess I would lean more towards a headless operation problem than a network problem.

I started using tmux precisely to rule out network problems, as the processes I launch run in the background, like demons, and there is no need to keep a network connection while they run. Maybe my choice of title for this issue wasn't the best one.

MartyG-RealSense commented 3 years ago

There is only one previous case I know of in the history of the RealSense ROS forum that involves tmux. In that case ( https://github.com/IntelRealSense/realsense-ros/issues/1187 ), the RealSense ROS user's problem was that the camera seemed to be always be busy. They found that resetting the nodes corrected the problem, as a simple reset of the camera did not work.

In https://github.com/IntelRealSense/realsense-ros/issues/1187#issuecomment-676825629 they first reference tmux and list in detail the performance analysis that they had carried out.

MartyG-RealSense commented 3 years ago

Hi @martinakos Do you require further assistance with this case, please? Thanks!

martinakos commented 3 years ago

sorry, I was on holidays for the last two weeks and I only checked this issue a couple of times. Yes, this is still unsolved. Today I'll get back to do some experiments about this and report.

MartyG-RealSense commented 3 years ago

No problem at all, @martinakos - I look forward to your next update. Good luck!

MartyG-RealSense commented 3 years ago

Hi @martinakos Do you have an update about this case that you can provide, please? Thanks!

martinakos commented 3 years ago

Hi, I came back to look at this problem. I've done a few tests and things are getting weirder and weirder..

So I avoid using tmux as it seems it's creating confusion. I first tested the case when things works, from the NUC console. With a screen and USB keyboard and mouse attached to the NUC I launch the realsense ros driver and I record a 10s rosbag with the imu and ir image topics. It works all the expected messages are in the rosbag. Then I tried an ssh session into the NUC and I launch the realsense ros driver, I launch it with & (so that the process goes to the background) and then I can launch from the same ssh session rosbag record for 10s. As explained above, it records the imu messages fine but the ir image messages are recorded with very few or none at all messages. I was trying a few variations of these switching the keyboard and mouse from the NUC to my laptop, when I noticed that every time I switch the USB to my laptop lots of errors appear in the output of the realsense ros driver and the frequency of the ir image topics decreases from 30 hz to 5 hz or less. Then I switch back the USB to the NUC and the errors on the realsense ros driver stop and the frequency of ir image messages goes back to 30 hz.

As it sounds too weird I took a couple of videos. In the first one you can see me launching the realsense ros driver and checking that the hz for the ir image messages are 30hz, then I switch the USB switch to my laptop and the errors in the realsense ros driver start and the hz for the ir image messages decrease. Finally I switch back to the USB to the NUC and errors stop and the hz go back to 30 hz. see video.

Notice that I the realsense D435i camera is connected to another USB port in the NUC so it's not in the USB switch. As you can see there is a few things connected to the USB switch, a keyboard a mouse and a webcam. While the realsense driver is working without errors and with 30hz for the ir image topics, I disconnect the webcam. No change. I disconnect the keyboard. No change. I disconnect the mouse and the errors and decreased hz go back. I can reproduce this just connecting and disconnecting the mouse to the NUC. I've tried wireless and wired mice and the same. It only works when there is a mouse connected to the NUC! see video.

Another thing I've tried is to record the rosbag through the ssh session when the mouse is connected to the NUC and when it's not connected. When the mouse is connected I can record the rosbag fine (through ssh). When the mouse is not connected the recorded rosbag has few or none ir image messages. I have also try to boot the NUC without screen or keyboard attached, just a mouse, and connect to the NUC through ssh. With the mouse connected I can record the rosbag fine, when I disconnect the mouse I have the problems with few ir image messages. So it wasn't an ssh problem after all.

Any ideas?

MartyG-RealSense commented 3 years ago

Hi @martinakos I have never personally heard of the problem described above. I researched it thoroughly but could not find a similar case either.

It does not seem to be related to the mouse hardware specifically, since you have tried different wired and wireless mice. It is also strange that unplugging the mouse would have such a specific effect on the IR messages of ROS.

Since multiple mice can be attached to the same computer to control the same mouse cursor, I wonder if a workaround may be to have a wireless mouse's transceiver stick plugged into the NUC but not use the accompanying wireless mouse, and instead use the other mouse that is plugged into the switcher. This may satisfy the NUC's need for a mouse to be connected whilst still letting you use the mouse on the switcher for both NUC and laptop.

martinakos commented 3 years ago

Yes, that's what I'm using as a workaround for the moment but the bug is still in there! and that's a crazy bug! I can't imagine a product maker budgeting for a wireless mouse in their product to get the realsense cameras working correctly. Also this bug is going to be a lot of grief for developers. Myself I've spent more than 2 months struggling with this issue. Initially I was using an older NUC and experiencing the same problems (despite things worked well when connected to the console, as there was a mouse connected!) but I replaced it with a more powerful one thinking there may be some throughput problems. The new NUC didn't solve the problems either. So it's not something specific to the model of NUC I'm using. Another issue about this problem should be open, here or in the librealsense project or somewhere else. The NUCs are intel so I guess you have privileged access to information.

MartyG-RealSense commented 3 years ago

As Intel NUC mini-PCs are not part of the RealSense product range, I do not have access to privileged technical information about them. NUCs have been used with RealSense cameras and the RealSense ROS wrapper for years and this is the first time that I have heard of this particular issue with mice on NUC. I will highlight it to Doronhi the RealSense ROS wrapper developer though. Thanks very much for the report.

MartyG-RealSense commented 3 years ago

Hi @martinakos I received advice from Doronhi. Could you confirm for me please the method that you used to install the RealSense ROS wrapper (building librealsense and the ROS wrapper together from Debian packages with 'Method 1' or building the ROS wrapper separately from source code with 'Method 2').

If you built the ROS wrapper and the librealsense SDK together from ROS packages then the build will be based on the SDK's RSUSB backend. Instead, remove this build and install the librealsense SDK and ROS wrapper separately - the librealsense SDK from packages that are based on V4L backend and the ROS wrapper from source code, as described in https://github.com/IntelRealSense/realsense-ros/issues/2068#issuecomment-918948716

Building the librealsense SDK from packages https://github.com/IntelRealSense/librealsense/blob/master/doc/distribution_linux.md

Building the ROS wrapper from source code with Method 2 https://github.com/IntelRealSense/realsense-ros#method-2-the-realsense-distribution

martinakos commented 3 years ago

I think I installed the librealsense2 with sudo apt-get install, following instructions here and then I installed the sudo apt-get install ros-melodic-realsense2-camera So that looks like a mix of method 1 and 2. I can't confirm it now as I don't have the NUC with me. On Wednesday I'll check the NUC, check the history and confirm how I installed it. I'll also check whether I'm using the RSUSB library or patched kernel modules for v4l as described in here and report back.

I remember one thing I did make sure when I installed the librealsense2 and that was that the meta information for all channels color, depth, ir images, gyro and accel was shown correctly in the realsense-viewer, and it was indeed shown correctly. I needed to check this as that is needed for a data recorder from Slamcore (which I'm also using with the NUC) that interfaces directly with librealsense2 (so no ROS involved) and it also shows data rate issues when I don't have a mouse connected. So that shows it must be a librealsense2 issue and not a ros-melodic-realsense2-camera issue.

martinakos commented 3 years ago

Another thing I didn't mention, and maybe I should've, is that I also recorded data from a T265 (the odometry, fisheye images, and imu) as well as from the D435i (ir images and imu) both connected to the NUC and the issues with data rate where only experienced (that is when there is no mouse connected) for the D435i ir images but not for the T265 fisheye images, which is weird.

MartyG-RealSense commented 3 years ago

The sudo apt-get install method that you first used on the distribution_linux page is not Method 1 of the ROS wrapper (which uses different packages). So it looks as though you installed the librealsense SDK first, and then when you subsequently used sudo apt-get install ros-melodic-realsense2-camera (which IS Method 1 of the ROS wrapper) you installed librealsense and the ROS wrapper together at the same time from packages.

So it is possible that you installed the librealsense SDK twice from packages using packages with different backends (V4L backend on the first installation made with the distribution_linux.md page's instructions and RSUSB backend on the second installation).

The packages installed with the distribution_linux.md instructions should have support for metadata included in them, meaning that you do not need to add metadata support manually with kernel patching when installing these packages.

The appropriate approach, as suggested by Doronhi, may therefore be to remove the previous package installations and then install the librealsense SDK from packages with the distribution_linux.md instructions, and subsequently build the ROS wrapper on its own from source code with Method 2.

MartyG-RealSense commented 3 years ago

Hi @martinakos Do you require further assistance with this case, please? Thanks!

martinakos commented 3 years ago

Sorry, I haven't been able to test the other installation method as I need to finish some other work. I'll test this asap and report back.

martinakos commented 3 years ago

Sorry I'm finishing another project first. Once I finish this project I'll spend some time on this issue. If you want to close the issue in the meantime, I'll post my results when I get round to it. Would that re-open the issue?

MartyG-RealSense commented 3 years ago

You can reopen a closed issue by using the Reopen issue button under the comment writing box. For now I will close this issue and you are welcome to return to it at a later date. Good luck!