containers / youki

A container runtime written in Rust
https://containers.github.io/youki/
Apache License 2.0
6.2k stars 337 forks source link

Creating container without network namespace leads to mount error #2745

Closed jeromegn closed 5 months ago

jeromegn commented 5 months ago

I've looked through the code and couldn't figure out why not using a network namespace causes an error mounting sysfs on /sys:

DEBUG this container does NOT create a new user namespace
DEBUG container directory will be "/containers/tail"
DEBUG Save container status: Container { state: State { oci_version: "v1.0.2", id: "tail", status: Creating, pid: None, bundle: "/bundles/tail", annotations: Some({}), created: None, creator: None, use_systemd: false, clean_up_intel_rdt_subdirectory: None }, root: "/containers/tail" } in "/containers/tail"
DEBUG this container does NOT create a new user namespace
DEBUG create notify listener socket_path="/containers/tail/notify.sock"
DEBUG the cwd to create the notify socket cwd="/containers/tail"
DEBUG unshare or setns: LinuxNamespace { typ: Pid, path: None }
DEBUG sending init pid (Pid(235))
DEBUG unshare or setns: LinuxNamespace { typ: Uts, path: None }
DEBUG unshare or setns: LinuxNamespace { typ: Ipc, path: None }
DEBUG unshare or setns: LinuxNamespace { typ: Cgroup, path: None }
DEBUG unshare or setns: LinuxNamespace { typ: Mount, path: None }
DEBUG prepare rootfs rootfs="/bundles/tail/rootfs"
DEBUG mount root fs "/bundles/tail/rootfs"
DEBUG mounting Mount { destination: "/proc", typ: Some("proc"), source: Some("proc"), options: None }
DEBUG mounting Mount { destination: "/dev", typ: Some("tmpfs"), source: Some("tmpfs"), options: Some(["nosuid", "strictatime", "mode=755", "size=65536k"]) }
DEBUG mounting Mount { destination: "/dev/pts", typ: Some("devpts"), source: Some("devpts"), options: Some(["nosuid", "noexec", "newinstance", "ptmxmode=0666", "mode=0620", "gid=5"]) }
DEBUG mounting Mount { destination: "/dev/shm", typ: Some("tmpfs"), source: Some("shm"), options: Some(["nosuid", "noexec", "nodev", "mode=1777", "size=65536k"]) }
DEBUG mounting Mount { destination: "/dev/mqueue", typ: Some("mqueue"), source: Some("mqueue"), options: Some(["nosuid", "noexec", "nodev"]) }
DEBUG mounting Mount { destination: "/sys", typ: Some("sysfs"), source: Some("sysfs"), options: Some(["nosuid", "noexec", "nodev", "ro"]) }
ERROR mount of "/sys" failed. EBUSY: Device or resource busy
ERROR failed to mount Mount { destination: "/sys", typ: Some("sysfs"), source: Some("sysfs"), options: Some(["nosuid", "noexec", "nodev", "ro"]) }: syscall
ERROR failed to prepare rootfs err=Mount(Syscall(Nix(EBUSY)))
ERROR failed to initialize container process: failed to prepare rootfs
ERROR failed to wait for init ready: exec process failed with error error in executing process : failed to prepare rootfs
ERROR failed to run container process err=Channel(ExecError("error in executing process : failed to prepare rootfs"))
ERROR could not build container: exec process failed with error error in executing process : failed to prepare rootfs
ERROR Error: exec process failed with error error in executing process : failed to prepare rootfs

If I add the network namespace then this works. The rest of the logs are identical (except for unshare or setnet: LinuxNamespace { typ: Network, path: None }.

I believe this is reproducible with a spec produced like:

let linux = LinuxBuilder::default()
    .namespaces(vec![
        LinuxNamespaceBuilder::default()
            .typ(LinuxNamespaceType::Pid)
            .build()
            .unwrap(),
        // LinuxNamespaceBuilder::default()
        //     .typ(LinuxNamespaceType::Network)
        //     .build()
        //     .unwrap(),
        LinuxNamespaceBuilder::default()
            .typ(LinuxNamespaceType::Ipc)
            .build()
            .unwrap(),
        LinuxNamespaceBuilder::default()
            .typ(LinuxNamespaceType::Uts)
            .build()
            .unwrap(),
        LinuxNamespaceBuilder::default()
            .typ(LinuxNamespaceType::Mount)
            .build()
            .unwrap(),
        LinuxNamespaceBuilder::default()
            .typ(LinuxNamespaceType::Cgroup)
            .build()
            .unwrap(),
    ])
    .build()
    .unwrap();

let spec = SpecBuilder::default()
    .linux(linux)
    .build()
    .unwrap();

I assume something is happening when using a network namespace that leads to /sys being mountable.

What I'm trying to achieve: starting a container that has access to the host networking.

JCKeep commented 5 months ago

Without net namespace, it should mount sysfs through bind-mount:

{
    "destination": "/sys",
    "type": "none",
    "source": "/sys",
    "options": [
        "rbind",
        "nosuid",
        "noexec",
        "nodev",
        "ro"
    ]
},

In a net namespace, it should mount sysfs directly:

{
    "destination": "/sys",
    "type": "sysfs",
    "source": "sysfs",
    "options": [
        "nosuid",
        "noexec",
        "nodev",
        "ro"
    ]
},

I think it can help you.

jeromegn commented 5 months ago

@JCKeep thank you! That fixed it.

utam0k commented 5 months ago

@JCKeep Thanks!