run_dev.sh: using `--gpus` instead of `--runtime nvidia`

NVIDIA-ISAAC-ROS / isaac_ros_common

Common utilities, packages, scripts, Dockerfiles, and testing infrastructure for Isaac ROS packages.

https://developer.nvidia.com/isaac-ros-gems

Other

197 stars 140 forks source link

run_dev.sh: using `--gpus` instead of `--runtime nvidia` #101

Open Interpause opened 1 year ago

Interpause commented 1 year ago

https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_common/blob/6d3c5c00e0e2b3fc1d75eb4286848d23b05d6dca/scripts/run_dev.sh#L195-L203

I noticed run_dev.sh's Docker container works (torch.cuda.is_available() returns True) if I replace --runtime nvidia with --gpus all. I also noticed in the dev environment setup guide (https://nvidia-isaac-ros.github.io/getting_started/dev_env_setup.html) that nvidia-container-runtime is deprecated. Is using --gpus all more suitable on newer versions of Docker?

hemalshahNV commented 1 year ago

--gpus all should enable the same runtime behavior but need to confirm with the nvidia-container-runtime engineers. Thanks for the heads up.

Buddies-as-you-know commented 1 year ago

Has this been changed? If I use --gpus instead of --runtime in run_dev.sh, you will get an error like this.

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

jaiveersinghNV commented 1 year ago

@Buddies-as-you-know , could you confirm what version of the CUDA Drivers you have installed? The missing libnvidia-ml.so.1 library should be included as part of a proper CUDA installation.