NVIDIA / libnvidia-container

NVIDIA container runtime library
Apache License 2.0
836 stars 204 forks source link

How to use the runtime hook for rootless RunC containers? #49

Open kutschkem opened 5 years ago

kutschkem commented 5 years ago

I would like to run RunC containers based on nvidia-docker rootless. But using the runtime hook I get

container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --no-cgroups --device=all --compute --utility --require=cuda>=10.1 brand=tesla,driver>=418,driver<419 --pid=18266 /home/testCuda/build/runc/rootfs]\\\\nnvidia-container-cli: permission error: capability change failed: operation not permitted\\\\n\\\"\""

I tried the solution from https://github.com/moby/moby/issues/38729 of setting no-cgroups = true (as you can see from the command line), but still no progress. I do not understand whether I need additional capabilities in my runc config, or something else.

My RunC configuration looks like this:


{
    "ociVersion": "1.0.0-rc5-dev",
    "root": {
        "path": "rootfs",
        "readonly": false
    },
    "process": {
        "args": [
            "bash", "./startup.sh", "matrix1_testCuda_gtest"
        ],
        "cwd": "/app",
        "env": [
"PATH=/opt/cmake/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"CUDA_VERSION=10.1.105",
"CUDA_PKG_VERSION=10-1=10.1.105-1",
"LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64",
"NVIDIA_VISIBLE_DEVICES=all",
"NVIDIA_DRIVER_CAPABILITIES=compute,utility",
"NVIDIA_REQUIRE_CUDA=cuda>=10.1 brand=tesla,driver>=418,driver<419",
"NCCL_VERSION=2.4.2",
"LIBRARY_PATH=/usr/local/cuda/lib64/stubs",
"CCACHE_SLOPPINESS=include_file_ctime,include_file_mtime",
"CONAN_SYSREQUIRES_SUDO=0",

            "TERM=xterm"
            ],
        "oomScoreAdj": 0,
        "terminal": false,
        "user": {
            "gid": 0,
            "uid": 0
            },
        "noNewPrivileges": true,
        "capabilities": {
            "bounding": [
                "CAP_MKNOD",
                "CAP_NET_RAW",
                "CAP_KILL",
                "CAP_AUDIT_WRITE"
            ],
            "effective": [
                "CAP_MKNOD",
                "CAP_NET_RAW",
                "CAP_KILL",
                "CAP_AUDIT_WRITE"

            ],
            "inheritable": [
                "CAP_MKNOD",
                "CAP_NET_RAW",
                "CAP_KILL",
                "CAP_AUDIT_WRITE"
            ],
            "permitted": [
                "CAP_MKNOD",
                "CAP_NET_RAW",
                "CAP_KILL",
                "CAP_AUDIT_WRITE"
            ]
        },
        "rlimits": [
        ]
    },

    "linux": {
        "uidMappings": [
            {
                "hostID": 500101175,
                "containerID": 0,
                "size": 1
            }
        ],
        "gidMappings": [
            {
                "hostID": 513,
                "containerID": 0,
                "size": 1
            }
        ],
        "maskedPaths": [
            "/proc/asound",
            "/proc/acpi",
            "/proc/kcore",
            "/proc/keys",
            "/proc/latency_stats",
            "/proc/timer_list",
            "/proc/timer_stats",
            "/proc/sched_debug",
            "/proc/scsi",
            "/sys/firmware"
        ],
        "namespaces": [
            {
                "type": "mount"
            },
            {
                "type": "uts"
            },
            {
                "type": "pid"
            },
            {
                "type": "ipc"
            },
            {
                "type": "user"
            }
        ],
    "readonlyPaths": [
        "/proc/bus",
        "/proc/fs",
        "/proc/irq",
        "/proc/sys",
        "/proc/sysrq-trigger"
        ]
    },
    "mounts": [
        {
            "destination": "/proc",
            "options": [
            "nosuid",
            "noexec",
            "nodev"
            ],
            "source": "proc",
            "type": "proc"
        },
        {
            "destination": "/dev",
            "options": [
                "nosuid",
                "strictatime",
                "mode=755",
                "size=65536k"
            ],
            "source": "tmpfs",
            "type": "tmpfs"
        },
        {
            "destination": "/dev/pts",
            "options": [
            "nosuid",
            "noexec",
            "newinstance",
            "ptmxmode=0666",
            "mode=0620"
            ],
            "source": "devpts",
            "type": "devpts"
        },
        {
            "destination": "/sys",
            "source": "/sys",
            "options": [
                "rbind",
                "nosuid",
                "noexec",
                "nodev",
                "ro"
            ],
            "type": "none"
        },
        {
            "destination": "/sys/fs/cgroup",
            "options": [
                "ro",
                "nosuid",
                "noexec",
                "nodev"
            ],
            "source": "cgroup",
            "type": "cgroup"
        },
        {
            "destination": "/dev/mqueue",
            "options": [
                "nosuid",
                "noexec",
                "nodev"
            ],
            "source": "mqueue",
            "type": "mqueue"
        }
    ]
    ,"hooks": {
        "prestart": [
            {
                "path": "/usr/bin/nvidia-container-runtime-hook",
                "args": ["nvidia-container-runtime-hook",  "-config", "/home/testCuda/build/runc/nvidia_hook.conf", "prestart"],
                "env": [
                    "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
                ]
            }
        ]
    }
}
garyedwards commented 5 years ago

I think I am having the same issue except using podman and the nvidia-container-runtime-hook. I can run a rootless container fine as long as I do not use any uidmaps. As soon as I do I get the following error:

E0619 08:28:25.428976 1 nvc_ldcache.c:375] could not start /sbin/ldconfig: mount operation failed: /proc: operation not permitted

There is obviously a permissions issue here somewhere caused by using uidmap. It looks like the above is also using uidmap.

Using uidmap without the nvidia hook works as expected. Any thoughts much appreciated.

kutschkem commented 4 years ago

@garyedwards How do you make your rootless container work without uidmap? This is a little old and I haven't worked for it for some time, but I remember the issue being permissions, I think. Do you just change permissions on the whole file system to allow arbitrary user ids?

garyedwards commented 4 years ago

I think I modified no-cgroups = true in the config.toml file as per the below issue:

https://github.com/moby/moby/issues/38729#issuecomment-463493866

animesh-bhadouria commented 3 years ago

@kutschkem I am getting the same error while trying to run nvidia-container-runtime in rootless mode. Were you able to resolve this issue?

kutschkem commented 3 years ago

@animesh-bhadouria No, sorry.