Field-Robotics-Lab / nps_uw_multibeam_sonar

Multibeam sonar plugin with NVIDIA Cuda library
Apache License 2.0
35 stars 20 forks source link

The output is buggy #50

Closed JaouadROS closed 1 year ago

JaouadROS commented 1 year ago

I'm trying your package on GeForce GTX 950M 4GB but it only works a couple of frames and it stops and then starts again for a couple of frames randomly. Here is a screenshot of the output:

Screenshot from 2022-11-16 16-34-15

As can be seen, I'm not even using 1/4 of the GPU memory:

GPU (id = 0) memory: used = 0.81 GB, free= 3.43 GB, total= 4.24 GB

In the scene, I only have one object which is the rod. I'm running sonar_tank_oculus_m1200d_nps_multibeam.launch.py launch file using the default parameters. I've only added a trajectory to the world file to move the sonar following key points. Ubuntu 20.04 is my OS.

woensug-choi commented 1 year ago

Hi :) Something I've never experienced... Is the terminal output continuously updating? Not sure how it's being delayed. Does the .csv file being written?

JaouadROS commented 1 year ago

Hello! What I've shown above is when using my ROS2 version. The terminal output is not updating continuously, and the same is true for the .csv file. It generates a couple of frames (like 10 or 15 usually, sometimes less), and then it stops generating any image. Using ROS1 version (the main branch), my CPU runs nearly 100% all threads and the GPU is at only 54MiB memory usage, but no image is generated.

Screenshot from 2022-11-21 14-50-32

What I have tried is commenting the line where NpsGazeboSonar::sonar_calculation_wrapper is being used and replacing that with a simple initialization of P_Beams variable, and the same behavior is repeated using both ROS1 and ROS2. So it seems that it is a gazebo plugin related issue?

JaouadROS commented 1 year ago

Updates: Using ROS2 with no objects in the scene, I get continuous outputs (black sonar image, terminal, etc.). Adding one object, mud_anchor, the outputs start to get discontinuous. I've noticed that with Coke object, the outputs are very often continuous. CPU usage with no object in the scene (black sonar image) with 54MiB GPU memory usage. Screenshot from 2022-11-21 17-09-39

Using ROS1 with no objects in the scene, the same behavior as before, CPU at around 80% usage, 54MiB GPU memory usage, and no outputs.

It clearly has something to do with the plugin generating the images, image_raw and depth. Right?

woensug-choi commented 1 year ago

I still can't grasp any idea where to tackle first for this issue... I have not tried ROS2. Hmm... so it works ok with coke but doesn't with other objects?

The image is generated after all calculations are done. So image generating part could not cause it to halt.

Let's get back to the original main branch and start thinking from there.

At ROS1, the main branch, run the tutorial launch file. It stops right? It seems others were able to run the tutorial launch file fine (although some issues are listed on the issues page). If so, I would suspect that not all NVIDIA hardware is compatible with the code. Is this guess valid? What do you think?

woensug-choi commented 1 year ago

Could you try using the exact versions in the installation document? CUDA 11.1 NVIDIA graphics driver 455.32

woensug-choi commented 1 year ago

Hmm.. it seems Geforce GTX 950M isn't compatible with NVIDIA graphics driver 455.32. Not listed. What could be the alternative for best compatibility with CUDA 11.1?

image

woensug-choi commented 1 year ago

What a headache. It seems it would be better to recode using newer CUDA, NVIDIA hardware/software settings.

Best bet I would try is (515.48.07 driver) + (CUDA 11.7 Update 1) settings. At least, this combination is listed as compatible according to Table 3 in https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

Or, (with much headache) https://docs.nvidia.com/deploy/cuda-compatibility/index.html#forward-compatibility-title

If none of generic cuda/nvidia versions work, I would like you to try using the docker https://field-robotics-lab.github.io/dave.doc/contents/dave_sensors/Multibeam-Forward-Looking-Sonar/#option-a-use-docker

JaouadROS commented 1 year ago

I still can't grasp any idea where to tackle first for this issue... I have not tried ROS2. Hmm... so it works ok with coke but doesn't with other objects?

Yes, it works with very simple objects like coke and or no object in the scene.

The image is generated after all calculations are done. So image generating part could not cause it to halt.

Let's get back to the original main branch and start thinking from there.

At ROS1, the main branch, run the tutorial launch file. It stops right? It seems others were able to run the tutorial launch file fine (although some issues are listed on the issues page). If so, I would suspect that not all NVIDIA hardware is compatible with the code. Is this guess valid? What do you think?

No, using ROS1, first I didn't get any outputs. My CPU keeps running at 80% but nothing is generated. Today I waited for some time and I get the first outputs after 4 min and then it behaves just like ROS2 (discontinuous outputs).

JaouadROS commented 1 year ago

In terms of the driver and cuda version, I think everything is setup correctly. I didn't get any errors related to cuda and I'm using the latest driver. I'm still thinking that rendering the images with gazebo takes some time, that's the cause of the discontinuity. I should get the image_raw right away at the beginning and continuously, right? The depth image might get some time to process? Can you tell me which CPU are you using? the rendering is not running on the GPU right?

woensug-choi commented 1 year ago

Generating depth image is done with generic depth_camera plugin code. It should only take very little cpu loads.. The code has been tested in various cpus and didnt cause any problem there. Could you try recompiling with sonar calculation wrapper function commented out and check depth image being generated continously?

woensug-choi commented 1 year ago

The depth image and point clouds are the input for the sonar calculations. those should be gendrated almost instantaneously on every system frame updates

woensug-choi commented 1 year ago

Making clean installation myself to put us on the same page...

woensug-choi commented 1 year ago

With clean install at WSL2 (NVIDIA Driver 520.61.05, CUDA Version 12.0 @ nvidia-smi output, and Cuda_11.8.r11.8 @ nvcc --version output). Running the roslaunch nps_uw_multibeam_sonar sonar_tank_blueview_p900_nps_multibeam.launch with <debugFlag>true</debugFlag> @ models/blueview_p900_nps_multibeam/model.sdf, output the following. What do you get?

image

JaouadROS commented 1 year ago

With clean install at WSL2 (NVIDIA Driver 520.61.05, CUDA Version 12.0 @ nvidia-smi output, and Cuda_11.8.r11.8 @ nvcc --version output). Running the roslaunch nps_uw_multibeam_sonar sonar_tank_blueview_p900_nps_multibeam.launch with <debugFlag>true</debugFlag> @ models/blueview_p900_nps_multibeam/model.sdf, output the following. What do you get?

image

It seems that you have three frames generated at the beginning just like me and it stops for me for a while after that, does it happen to you or it continues publishing images?

I started with a clean workspace myself too. I couldn't run sonar_tank_blueview_p900_nps_multibeam.launch, for some reason gazebo was stuck at the logo, but I successfully run sonar_tank_oculus_m1200d_nps_multibeam.launch instead with debugFlag at true.

Screenshot from 2022-11-23 12-06-47

In another terminal, I run rostopic hz /oculus_m1200d/image_raw. As you can see it doesn't start right away:

Screenshot from 2022-11-23 18-23-52

And after some frames are published, it stops again: Screenshot from 2022-11-23 18-25-25

After that, I commented the sonar calculation wrapper function, and it seems to work smoothly, I waited for a couple of minutes and I had ~10Hz constantly. Even though it consumes around 40% of my CPU Screenshot from 2022-11-23 18-35-21

Now, when I try to move an object or the sonar itself manually on gazebo, the publication stops for many frames (+40 in that experiment) and it comes back again, (the sonar wrapper is still commented):

Screenshot from 2022-11-23 18-39-20

In my last comments since the beginning, the sonar was moving all the time, I didn't think that could be the problem. Could you move something in the scene or the sonar using gazebo and see if you can reproduce the issue, please? Thank you

woensug-choi commented 1 year ago

I've screen captured only the first three. It continued to work.

I've found the problem! It was the oculus launch file messed up to run customSDFTag version of worlds. I've made a new PR to fix things. It also includes fixes for the orientation problems. If you dont' want to use new PR (https://github.com/Field-Robotics-Lab/nps_uw_multibeam_sonar/pull/53), try followings,

Difference in settings between Blueview P900 and Oculus M1200d is that

<constantReflectivity>true</constantReflectivity>

Change model.sdf of Oculus M1200d's constantReflectivity to true. Then it would not load customSDFTag feature.

JaouadROS commented 1 year ago

Yes, it seems working now. However, my GPU is still using no more than 54MiB. It is not an issue but just wondering why we need 4G GPU memory. Thank you once again for your help.