Open agirault opened 1 year ago
cc @AndreasHeumann @jjomier
@agirault thanks for reporting this. Looking at the list of files, I think adding the following is relatively straightforward:
/usr/share/nvidia/nvoptix.bin
The following (for aarch64) is also not really a problem:
/usr/share/vulkan/icd.d/nvidia_layers.json
but it would be good to confirm that there is no conflicting file at this location for x86_64 systems. Handling such a conflict is possible though, we just need an indication as to whether the additional effort is required there.With regards to the libnvidia-egl-gbm.so
file. Since the file actually included in the driver installation is libnvidia-egl-gbm.so.1.1.0
it would be good to understand which symlinks on the host (in either case) point to this file.
The same is required for libnvidia-api.so.1
. Here it's key to know what this points to on the host -- since it's expected to be a symbolic link.
I have created https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/merge_requests/501 to add the processing of these files. If we can settle on a final list of missing ones that should be included we can get that in to an upcoming release candidate.
We have just released https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.15.0-rc.1 that includes the injection of the nvoptix.bin
file. The packages are available from our public experimental repositories.
Assuming these have been configured running:
sudo apt-get install -y \
nvidia-container-toolkit=1.15.0~rc.1-1 \
nvidia-container-toolkit-base=1.15.0~rc.1-1 \
libnvidia-container-tools=1.15.0~rc.1-1 \
libnvidia-container1=1.15.0~rc.1-1
should install the required packages.
Note that we have backported these changes to the release-0.14
branch and they are included in the v1.14.4
release.
@agirault if you get a chance to validate what is still missing that would be great.
I think the issue is back with latest
apt list --installed | grep nvidia-container-toolkit
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
nvidia-container-toolkit-base/unknown,now 1.15.0-1 amd64 [installed,automatic]
nvidia-container-toolkit/unknown,now 1.15.0-1 amd64 [installed,automatic]
I recently started getting in Omniverse docker container for Isaac Sim:
Could not open optix denoiser weights file "/usr/share/nvidia/nvoptix.bin"
@turowicz first, could you confirm that the file exists on your host?
Then, which docker command are you running? Could you confirm that you are using the nvidia
runtime and that the image has NVIDIA_DRIVER_CAPABILITIES=all
set (alternatively add -e NVIDIA_DRIVER_CAPABILITIES=all
to your docker command line).
The nvoptix.bin
file is only injected if NVIDIA_DRIVER_CAPABILITIES
include graphics
or display
.
yes, to fix the error I had to -v /usr/share/nvidia/nvoptix.bin:/usr/share/nvidia/nvoptix.bin
I am using nvidia runtime through --gpus all
I don't use -e NVIDIA_DRIVER_CAPABILITIES=all
and I have never used it. It used to work fine without it.
updated my answer above
addon: I am using nvcr.io/nvidia/isaac-sim:2023.1.1
and it used to work fine.
I confirm the container nvcr.io/nvidia/isaac-sim:2023.1.1
has NVIDIA_DRIVER_CAPABILITIES=all
@turowicz could you provide the full docker command you run?
Here's the .devcontainer file:
// See https://aka.ms/vscode-remote/containers for the
// documentation about the devcontainer.json format
{
"name": "surveily.omniverse",
"build": {
"dockerfile": "dockerfile"
},
"runArgs": [
"--name",
"surveily.omniverse",
"-v",
"${env:HOME}${env:USERPROFILE}/.ssh:/root/.ssh-localhost:ro",
"-v",
"/var/run/docker.sock:/var/run/docker.sock",
"-v",
"/usr/share/nvidia/nvoptix.bin:/usr/share/nvidia/nvoptix.bin",
"--network",
"host",
"--gpus",
"all",
"-e",
"ACCEPT_EULA=Y",
"-e",
"PRIVACY_CONSENT=N"
],
"postCreateCommand": "mkdir -p ~/.ssh && cp -r ~/.ssh-localhost/* ~/.ssh && chmod 700 ~/.ssh && chmod 600 ~/.ssh/*",
"appPort": [
"5003:5003"
],
"extensions": [
"kosunix.guid",
"redhat.vscode-yaml",
"rogalmic.bash-debug",
"mikeburgh.xml-format",
"donjayamanne.githistory",
"ms-azuretools.vscode-docker",
"ms-azure-devops.azure-pipelines",
],
"settings": {
"extensions.autoUpdate": false,
"files.exclude": {
"**/CVS": true,
"**/bin": true,
"**/obj": true,
"**/.hg": true,
"**/.svn": true,
"**/.git": true,
"**/.DS_Store": true,
"**/BenchmarkDotNet.Artifacts": true
}
},
"shutdownAction": "stopContainer",
}
and the dockerfile:
FROM nvcr.io/nvidia/isaac-sim:2023.1.1
# Install tools
RUN apt update && apt install git vim -y
# Remove ROS/2 Bridge
RUN sed -i 's/ros_bridge_extension = "omni.isaac.ros2_bridge"/ros_bridge_extension = ""/g' /isaac-sim/apps/omni.isaac.sim.base.kit
# Toggle Grid Off
RUN sed -i '17i import omni.kit.viewport' /isaac-sim/extscache/omni.replicator.replicator_yaml-2.0.4+lx64/omni/replicator/replicator_yaml/scripts/replicator_yaml_extension.py
RUN sed -i '100i \ \ \ \ \ \ \ \ omni.kit.viewport.actions.actions.toggle_global_visibility(visible=False)' /isaac-sim/extscache/omni.replicator.replicator_yaml-2.0.4+lx64/omni/replicator/replicator_yaml/scripts/replicator_yaml_extension.py
my workaround works but you guys may want to fix the problem
Enabling Optix denoise requires the
/usr/share/nvidia/nvoptix.bin
file which is installed as part oflibnvidia-gl-<ver>
package but not present in containers with nvidia ctk runtime.Workaround for Holoscan: https://github.com/nvidia-holoscan/holohub/pull/112/files
Content of libnvidia-gl-535
Files not mounted with nvidia runtime
Run this command to test:
Observations
dll
files on x86_64?/wine/nvngx.dll
. Interestingly, there is nolibnvidia-ngx.so.1
on x86_64 (vs aarch64).nvidia-ngx-updater
,libnvidia-api.so.1
andlibnvidia-vulkan-producer.so.535
only exist on x86_64. Expected ? Need mounting?libnvidia-egl-gbm.so
exist for both x86_64 and aarch64, but missing only in aarch64 containers.nvidia_layers.json
is inicd.d
on aarch64, instead ofimplicit_layer.d
in x86_64. The former isn't mounted, while the latter is.