Closed TL-4319 closed 2 months ago
Another thing I notice now while experimenting is that within the lunarsim container, after I have done
source /opt/ros/humble/setup.bash
rviz2
RVIZ2 failed to open and show the following error.
root@lagerprocessor:~/lunarsim# rviz2
QStandardPaths: wrong permissions on runtime directory /tmp, 0777 instead of 0700
Unable to create glx context
There are a lot of error for "Failed to create an OpenGL context" then
terminate called after throwing an instance of 'std::runtime_error'
what(): Unable to create the rendering window after 100 tries
Aborted (core dumped)
I have modified the run_once.sh file as follow
#!/bin/bash
xhost +local:root
XSOCK=/tmp/.X11-unix
docker run -it --rm \
--gpus all \
-e DISPLAY=$DISPLAY \
-v $XSOCK:$XSOCK \
-v $HOME/.Xauthority:/root/.Xauthority \
--privileged \
--net=host \
osrf/ros:humble-desktop bash
Running the above bash script and source ros2 in the container and I was able to run RVIZ2 as usual.
Hopefully this little experiment can provide some insight on what can be the cause
Some more experimentation. I update the host PC NVIDIA driver to version 535.146.02. I changed the docker file in the repo in line 38 from "RUN apt install nvidia-driver-525 -y" to "RUN apt install nvidia-driver-535 -y" which also build successfully.
When I run the built image and within the container, I ran "nvidia-smi" and see the following
root@lagerprocessor:~/lunarsim# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.183
I have also tried it with the original 525 driver but still get the same error code above
I have finally got the sim running. I need to install the matching NVML that the container is expecting 535.183 in this case. I did so following the instruction from here to unload the nvidia-drm. This is required to run the NVIDIA installer on my particular system. I also reverted the nvidia driver installed by DockerFile to 525 since it does not seems to matter.
This is still weird that the ros2 image from OSRF does not have any issue with the NVIDIA driver while this image has issue. Regardless, this might just be an issue with NVIDIA drivers so I'll close the issue. Thank you for the great work.
Hi,
Thank you for the hard work.
I recently try to give this project a try and been running into a problem.
This is my setup:
I have setup docker and been successfully run the ros2 container following this tutorial .
I have also installed nvidia-docker2 and nvidia-container-toolkit using
To double check the nvidia-driver is working, I ran nvidia-smi
The lunarsim:latest docker is also built successfully with the following return
Now to run the simulation, I run
Then within the container, I start the sim and receive the following text
It seems like the GUI failed to open? I also receive an error code for "Segmentation Fault: Core Dumped" but I have yet to replicate that.
I would appreciate any pointers on how to resolve this or how to start logging more thing to hopefully help with debugging.
Thanks