Error on vGPU - Githubissues

mkomuro commented 2 years ago

I got below err when I launch vslam stack. My environment is on vGPU in VMware. I tested nvblox and it works well in the same VM. Any guess what's wrong? The VM has:

8x vCPUs
64G RAM
2x A10
256G SSD
OS: Ubuntu 20.04.4 LTS
dGPU 470.129.06

admin@omn-u20-mkomuro:/workspaces/isaac_ros-dev/ros_ws$ ros2 launch isaac_ros_visual_slam isaac_ros_visual_slam_isaac_sim.launch.py
[INFO] [launch]: All log files can be found below /home/admin/.ros/log/2022-06-15-06-07-12-816997-omn-u20-mkomuro-10633
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [component_container-1]: process started with pid [10646]
[component_container-1] [INFO] [1655273233.262291079] [visual_slam_launch_container]: Load Library: /workspaces/isaac_ros-dev/ros_ws/install/isaac_ros_visual_slam/lib/libvisual_slam_node.so
[ERROR] [component_container-1]: process has died [pid 10646, exit code -4, cmd '/opt/ros/foxy/lib/rclcpp_components/component_container --ros-args -r __node:=visual_slam_launch_container -r __ns:=/'].

Code is not head branch but it works in my local workstation (RTX2080Ti with the same OS and dGPU version).

git clone -b hemalshahNV-patch-1 --recurse-submodules https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_common.git
git clone -b hotfix_1 https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_image_pipeline
git clone -b release-ea3-hotfix1 https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_visual_slam

hemalshahNV commented 1 year ago

It appears that the process dies with no log or error message which could indicate any number of things unfortunately.

Are you using the Isaac ROS Common base Docker container to run within the VM instance? What version of CUDA is available in this VM instance?

mkomuro commented 1 year ago

Are you using the Isaac ROS Common base Docker container to run within the VM instance?

Yes.

It's CUDA 11.4. I think it meets the requirements. https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_common/tree/hemalshahNV-patch-1#x86_64 https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_visual_slam/tree/release-ea3-hotfix1#x86_64

Ubuntu 20.04+
CUDA 11.4 supported discrete GPU
Nvidia driver version >= 470.103.01

$ nvidia-smi
Wed Jun 15 15:13:12 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06   Driver Version: 470.129.06   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10          On   | 00000000:0B:00.0 Off |                    0 |
|  0%   59C    P0   153W / 150W |   5488MiB / 22731MiB |     50%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

hemalshahNV commented 1 year ago

Could you try with the last Isaac ROS release? The abrupt process died in your log is usually indicative of running out of memory.

mkomuro commented 1 year ago

Sure. But unfortunately, my VM server has a problem and our admin has been working on resuming the system. Once he fixed the problem, I'll try new version of Isaac ROS. If I still have an issue with the latest Isaac ROS, I'll open a new issue. Thanks!

NVIDIA-ISAAC-ROS / isaac_ros_visual_slam

Error on vGPU #28