containers / crun

A fast and lightweight fully featured OCI runtime and C library for running containers
GNU General Public License v2.0
3.03k stars 308 forks source link

Help debugging with cri-o #35

Closed dmolik closed 5 years ago

dmolik commented 5 years ago

I keep getting this when I try and create a container: level=error msg="Container creation error: writing file 'cpu.shares': Bad file descriptor

ls /sys/fs/cgroup/cpu/kubepods/burstable/pod7f8667b0aa2fc59394329cc63d147fc3/
cgroup.clone_children  crio-conmon-018a9e0c4d02ed0ac3acadcb240df7d7e718a6264af811930048f75b55d16a58  crio-conmon-845ae88564fc18e50064223a1cefd85536d0a105fd6a50d14ca48a55be936114
cgroup.procs           crio-conmon-031f3ac376b34d2eecec24f263fcfd800091ad001013852ba42ecd4a5a2595e4  crio-conmon-a537c8308319eb1ab7710b9c4c4f1a590ae47c013dc38876908c8e3a7e070dbb
cpu.cfs_period_us      crio-conmon-3816120e55090b077cbdf75b62696b1e58b2655b8ee5165f28662cb9c165e3e3  crio-conmon-b4d592875062642b8627445dc26a9b80556442a8879f8deeb7be43a0d3f51c33
cpu.cfs_quota_us       crio-conmon-3aab6d526c5d97b401b287b6ecd28de911919940892b9a7a68e5adfdb969e57e  crio-conmon-c2295e785211b185f5726c647a24841cc3e444d4ca7bd0c7e29be87794f007c3
cpu.rt_period_us       crio-conmon-41c02b86cf760effc235e0b6498b45723102d23ce1daffa7cbd926ce0bd55da6  crio-conmon-d21d06f567283e6de85e51f0b87ad796fbca5f4dc397ab4748e2ae66bde5956e
cpu.rt_runtime_us      crio-conmon-468e517c34b9c0c9a4b466cbd00c89f859e00ee6b01fc89db54cd4bfa5c44499  crio-conmon-d8996193794ec44cde3dc14125f0481b5f6d4ec998dc1e6ac00d09ad4f002792
cpu.shares             crio-conmon-4cc0b934f3393dd33a40310ba09d6e3c9c0c2a498cfd1ceee8ac45d8d2201ba7  notify_on_release
cpu.stat               crio-conmon-7fcf8b268ab7050a1d4b2ee330aa4397169b60a431174ce463dff2a2d1096a21  tasks

Notice crio-UUID is missing

cat /sys/fs/cgroup/cpu/kubepods/burstable/pod7f8667b0aa2fc59394329cc63d147fc3/cpu.shares 
256
giuseppe commented 5 years ago

thanks for the report! I am currently working on support for CRI-O. My WIP is here: https://github.com/cri-o/cri-o/pull/2239.

I am still seeing some issues in crun that prevents the integration and e2e tests to pass but I am working on it.

Is there a way to store the config.json file for the container that is failing? That would make much easier to debug the issue, if not, I'll try to reproduce locally.

/cc @mrunalp

dmolik commented 5 years ago
lesser01 /var/lib/containers/storage # cat $(echo $PWD)/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/config.json
{
    "ociVersion": "1.0.1-dev",
    "process": {
        "user": {
            "uid": 0,
            "gid": 0
        },
        "args": [
            "/pause"
        ],
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "TERM=xterm"
        ],
        "cwd": "/",
        "capabilities": {
            "bounding": [
                "CAP_CHOWN",
                "CAP_DAC_OVERRIDE",
                "CAP_FSETID",
                "CAP_FOWNER",
                "CAP_NET_RAW",
                "CAP_SETGID",
                "CAP_SETUID",
                "CAP_SETPCAP",
                "CAP_NET_BIND_SERVICE",
                "CAP_SYS_CHROOT",
                "CAP_KILL"
            ],
            "effective": [
                "CAP_CHOWN",
                "CAP_DAC_OVERRIDE",
                "CAP_FSETID",
                "CAP_FOWNER",
                "CAP_NET_RAW",
                "CAP_SETGID",
                "CAP_SETUID",
                "CAP_SETPCAP",
                "CAP_NET_BIND_SERVICE",
                "CAP_SYS_CHROOT",
                "CAP_KILL"
            ],
            "inheritable": [
                "CAP_CHOWN",
                "CAP_DAC_OVERRIDE",
                "CAP_FSETID",
                "CAP_FOWNER",
                "CAP_NET_RAW",
                "CAP_SETGID",
                "CAP_SETUID",
                "CAP_SETPCAP",
                "CAP_NET_BIND_SERVICE",
                "CAP_SYS_CHROOT",
                "CAP_KILL"
            ],
            "permitted": [
                "CAP_CHOWN",
                "CAP_DAC_OVERRIDE",
                "CAP_FSETID",
                "CAP_FOWNER",
                "CAP_NET_RAW",
                "CAP_SETGID",
                "CAP_SETUID",
                "CAP_SETPCAP",
                "CAP_NET_BIND_SERVICE",
                "CAP_SYS_CHROOT",
                "CAP_KILL"
            ]
        },
        "oomScoreAdj": -998
    },
    "root": {
        "path": "/var/lib/containers/storage/overlay/34fa000f19d5b4ec46cdb9caabdfa8663115effba172effa6a4876c8ae872e69/merged",
        "readonly": true
    },
    "hostname": "lesser01",
    "mounts": [
        {
            "destination": "/proc",
            "type": "proc",
            "source": "proc"
        },
        {
            "destination": "/dev",
            "type": "tmpfs",
            "source": "tmpfs",
            "options": [
                "nosuid",
                "strictatime",
                "mode=755",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/pts",
            "type": "devpts",
            "source": "devpts",
            "options": [
                "nosuid",
                "noexec",
                "newinstance",
                "ptmxmode=0666",
                "mode=0620",
                "gid=5"
            ]
        },
        {
            "destination": "/dev/mqueue",
            "type": "mqueue",
            "source": "mqueue",
            "options": [
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys",
            "type": "sysfs",
            "source": "sysfs",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "ro"
            ]
        },
        {
            "destination": "/etc/resolv.conf",
            "type": "bind",
            "source": "/var/run/containers/storage/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/resolv.conf",
            "options": [
                "ro",
                "bind",
                "nodev",
                "nosuid",
                "noexec"
            ]
        },
        {
            "destination": "/dev/shm",
            "type": "bind",
            "source": "/var/run/containers/storage/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/shm",
            "options": [
                "rw",
                "bind"
            ]
        },
        {
            "destination": "/etc/hostname",
            "type": "bind",
            "source": "/var/run/containers/storage/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/hostname",
            "options": [
                "ro",
                "bind",
                "nodev",
                "nosuid",
                "noexec"
            ]
        }
    ],
    "annotations": {
        "component": "kube-scheduler",
        "io.kubernetes.container.name": "POD",
        "io.kubernetes.cri-o.Annotations": "{\"kubernetes.io/config.hash\":\"f44110a0ca540009109bfc32a7eb0baa\",\"kubernetes.io/config.seen\":\"2019-04-13T10:38:32.699683165-04:00\",\"kubernetes.io/config.source\":\"file\"}",
        "io.kubernetes.cri-o.CgroupParent": "/kubepods/burstable/podf44110a0ca540009109bfc32a7eb0baa",
        "io.kubernetes.cri-o.ContainerID": "5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2",
        "io.kubernetes.cri-o.ContainerName": "k8s_POD_kube-scheduler-lesser01_kube-system_f44110a0ca540009109bfc32a7eb0baa_0",
        "io.kubernetes.cri-o.ContainerType": "sandbox",
        "io.kubernetes.cri-o.Created": "2019-04-13T13:02:13.735873498-04:00",
        "io.kubernetes.cri-o.HostName": "lesser01",
        "io.kubernetes.cri-o.HostNetwork": "true",
        "io.kubernetes.cri-o.HostnamePath": "/var/run/containers/storage/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/hostname",
        "io.kubernetes.cri-o.IP": "",
        "io.kubernetes.cri-o.KubeName": "kube-scheduler-lesser01",
        "io.kubernetes.cri-o.Labels": "{\"component\":\"kube-scheduler\",\"io.kubernetes.container.name\":\"POD\",\"io.kubernetes.pod.name\":\"kube-scheduler-lesser01\",\"io.kubernetes.pod.namespace\":\"kube-system\",\"io.kubernetes.pod.uid\":\"f44110a0ca540009109bfc32a7eb0baa\",\"tier\":\"control-plane\"}",
        "io.kubernetes.cri-o.LogPath": "/var/log/pods/kube-system_kube-scheduler-lesser01_f44110a0ca540009109bfc32a7eb0baa/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2.log",
        "io.kubernetes.cri-o.Metadata": "{\"name\":\"kube-scheduler-lesser01\",\"uid\":\"f44110a0ca540009109bfc32a7eb0baa\",\"namespace\":\"kube-system\"}",
        "io.kubernetes.cri-o.MountPoint": "/var/lib/containers/storage/overlay/34fa000f19d5b4ec46cdb9caabdfa8663115effba172effa6a4876c8ae872e69/merged",
        "io.kubernetes.cri-o.Name": "k8s_kube-scheduler-lesser01_kube-system_f44110a0ca540009109bfc32a7eb0baa_0",
        "io.kubernetes.cri-o.Namespace": "kube-system",
        "io.kubernetes.cri-o.NamespaceOptions": "{\"network\":2,\"pid\":1}",
        "io.kubernetes.cri-o.PortMappings": "[]",
        "io.kubernetes.cri-o.PrivilegedRuntime": "true",
        "io.kubernetes.cri-o.ResolvPath": "/var/run/containers/storage/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/resolv.conf",
        "io.kubernetes.cri-o.RuntimeHandler": "",
        "io.kubernetes.cri-o.SandboxID": "5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2",
        "io.kubernetes.cri-o.SeccompProfilePath": "",
        "io.kubernetes.cri-o.ShmPath": "/var/run/containers/storage/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/shm",
        "io.kubernetes.pod.name": "kube-scheduler-lesser01",
        "io.kubernetes.pod.namespace": "kube-system",
        "io.kubernetes.pod.uid": "f44110a0ca540009109bfc32a7eb0baa",
        "kubernetes.io/config.hash": "f44110a0ca540009109bfc32a7eb0baa",
        "kubernetes.io/config.seen": "2019-04-13T10:38:32.699683165-04:00",
        "kubernetes.io/config.source": "file",
        "tier": "control-plane"
    },
    "linux": {
        "resources": {
            "devices": [
                {
                    "allow": false,
                    "access": "rwm"
                }
            ],
            "cpu": {
                "shares": 2
            }
        },
        "cgroupsPath": "/kubepods/burstable/podf44110a0ca540009109bfc32a7eb0baa/crio-5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2",
        "namespaces": [
            {
                "type": "pid"
            },
            {
                "type": "ipc"
            },
            {
                "type": "uts"
            },
            {
                "type": "mount"
            }
        ],
        "seccomp": {
            "defaultAction": "SCMP_ACT_ERRNO",
            "architectures": [
                "SCMP_ARCH_X86_64",
                "SCMP_ARCH_X86",
                "SCMP_ARCH_X32"
            ],
            "syscalls": [
                {
                    "names": [
                        "accept",
                        "accept4",
                        "access",
                        "alarm",
                        "bind",
                        "brk",
                        "capget",
                        "capset",
                        "chdir",
                        "chmod",
                        "chown",
                        "chown32",
                        "clock_getres",
                        "clock_gettime",
                        "clock_nanosleep",
                        "close",
                        "connect",
                        "copy_file_range",
                        "creat",
                        "dup",
                        "dup2",
                        "dup3",
                        "epoll_create",
                        "epoll_create1",
                        "epoll_ctl",
                        "epoll_ctl_old",
                        "epoll_pwait",
                        "epoll_wait",
                        "epoll_wait_old",
                        "eventfd",
                        "eventfd2",
                        "execve",
                        "execveat",
                        "exit",
                        "exit_group",
                        "faccessat",
                        "fadvise64",
                        "fadvise64_64",
                        "fallocate",
                        "fanotify_mark",
                        "fchdir",
                        "fchmod",
                        "fchmodat",
                        "fchown",
                        "fchown32",
                        "fchownat",
                        "fcntl",
                        "fcntl64",
                        "fdatasync",
                        "fgetxattr",
                        "flistxattr",
                        "flock",
                        "fork",
                        "fremovexattr",
                        "fsetxattr",
                        "fstat",
                        "fstat64",
                        "fstatat64",
                        "fstatfs",
                        "fstatfs64",
                        "fsync",
                        "ftruncate",
                        "ftruncate64",
                        "futex",
                        "futimesat",
                        "getcpu",
                        "getcwd",
                        "getdents",
                        "getdents64",
                        "getegid",
                        "getegid32",
                        "geteuid",
                        "geteuid32",
                        "getgid",
                        "getgid32",
                        "getgroups",
                        "getgroups32",
                        "getitimer",
                        "getpeername",
                        "getpgid",
                        "getpgrp",
                        "getpid",
                        "getppid",
                        "getpriority",
                        "getrandom",
                        "getresgid",
                        "getresgid32",
                        "getresuid",
                        "getresuid32",
                        "getrlimit",
                        "get_robust_list",
                        "getrusage",
                        "getsid",
                        "getsockname",
                        "getsockopt",
                        "get_thread_area",
                        "gettid",
                        "gettimeofday",
                        "getuid",
                        "getuid32",
                        "getxattr",
                        "inotify_add_watch",
                        "inotify_init",
                        "inotify_init1",
                        "inotify_rm_watch",
                        "io_cancel",
                        "ioctl",
                        "io_destroy",
                        "io_getevents",
                        "ioprio_get",
                        "ioprio_set",
                        "io_setup",
                        "io_submit",
                        "ipc",
                        "kill",
                        "lchown",
                        "lchown32",
                        "lgetxattr",
                        "link",
                        "linkat",
                        "listen",
                        "listxattr",
                        "llistxattr",
                        "_llseek",
                        "lremovexattr",
                        "lseek",
                        "lsetxattr",
                        "lstat",
                        "lstat64",
                        "madvise",
                        "memfd_create",
                        "mincore",
                        "mkdir",
                        "mkdirat",
                        "mknod",
                        "mknodat",
                        "mlock",
                        "mlock2",
                        "mlockall",
                        "mmap",
                        "mmap2",
                        "mprotect",
                        "mq_getsetattr",
                        "mq_notify",
                        "mq_open",
                        "mq_timedreceive",
                        "mq_timedsend",
                        "mq_unlink",
                        "mremap",
                        "msgctl",
                        "msgget",
                        "msgrcv",
                        "msgsnd",
                        "msync",
                        "munlock",
                        "munlockall",
                        "munmap",
                        "nanosleep",
                        "newfstatat",
                        "_newselect",
                        "open",
                        "openat",
                        "pause",
                        "pipe",
                        "pipe2",
                        "poll",
                        "ppoll",
                        "prctl",
                        "pread64",
                        "preadv",
                        "prlimit64",
                        "pselect6",
                        "pwrite64",
                        "pwritev",
                        "read",
                        "readahead",
                        "readlink",
                        "readlinkat",
                        "readv",
                        "recv",
                        "recvfrom",
                        "recvmmsg",
                        "recvmsg",
                        "remap_file_pages",
                        "removexattr",
                        "rename",
                        "renameat",
                        "renameat2",
                        "restart_syscall",
                        "rmdir",
                        "rt_sigaction",
                        "rt_sigpending",
                        "rt_sigprocmask",
                        "rt_sigqueueinfo",
                        "rt_sigreturn",
                        "rt_sigsuspend",
                        "rt_sigtimedwait",
                        "rt_tgsigqueueinfo",
                        "sched_getaffinity",
                        "sched_getattr",
                        "sched_getparam",
                        "sched_get_priority_max",
                        "sched_get_priority_min",
                        "sched_getscheduler",
                        "sched_rr_get_interval",
                        "sched_setaffinity",
                        "sched_setattr",
                        "sched_setparam",
                        "sched_setscheduler",
                        "sched_yield",
                        "seccomp",
                        "select",
                        "semctl",
                        "semget",
                        "semop",
                        "semtimedop",
                        "send",
                        "sendfile",
                        "sendfile64",
                        "sendmmsg",
                        "sendmsg",
                        "sendto",
                        "setfsgid",
                        "setfsgid32",
                        "setfsuid",
                        "setfsuid32",
                        "setgid",
                        "setgid32",
                        "setgroups",
                        "setgroups32",
                        "setitimer",
                        "setpgid",
                        "setpriority",
                        "setregid",
                        "setregid32",
                        "setresgid",
                        "setresgid32",
                        "setresuid",
                        "setresuid32",
                        "setreuid",
                        "setreuid32",
                        "setrlimit",
                        "set_robust_list",
                        "setsid",
                        "setsockopt",
                        "set_thread_area",
                        "set_tid_address",
                        "setuid",
                        "setuid32",
                        "setxattr",
                        "shmat",
                        "shmctl",
                        "shmdt",
                        "shmget",
                        "shutdown",
                        "sigaltstack",
                        "signalfd",
                        "signalfd4",
                        "sigreturn",
                        "socket",
                        "socketcall",
                        "socketpair",
                        "splice",
                        "stat",
                        "stat64",
                        "statfs",
                        "statfs64",
                        "symlink",
                        "symlinkat",
                        "sync",
                        "sync_file_range",
                        "syncfs",
                        "sysinfo",
                        "syslog",
                        "tee",
                        "tgkill",
                        "time",
                        "timer_create",
                        "timer_delete",
                        "timerfd_create",
                        "timerfd_gettime",
                        "timerfd_settime",
                        "timer_getoverrun",
                        "timer_gettime",
                        "timer_settime",
                        "times",
                        "tkill",
                        "truncate",
                        "truncate64",
                        "ugetrlimit",
                        "umask",
                        "uname",
                        "unlink",
                        "unlinkat",
                        "utime",
                        "utimensat",
                        "utimes",
                        "vfork",
                        "vmsplice",
                        "wait4",
                        "waitid",
                        "waitpid",
                        "write",
                        "writev"
                    ],
                    "action": "SCMP_ACT_ALLOW"
                },
                {
                    "names": [
                        "personality"
                    ],
                    "action": "SCMP_ACT_ALLOW",
                    "args": [
                        {
                            "index": 0,
                            "value": 0,
                            "op": "SCMP_CMP_EQ"
                        },
                        {
                            "index": 0,
                            "value": 8,
                            "op": "SCMP_CMP_EQ"
                        },
                        {
                            "index": 0,
                            "value": 4294967295,
                            "op": "SCMP_CMP_EQ"
                        }
                    ]
                },
                {
                    "names": [
                        "chroot"
                    ],
                    "action": "SCMP_ACT_ALLOW"
                },
                {
                    "names": [
                        "clone"
                    ],
                    "action": "SCMP_ACT_ALLOW",
                    "args": [
                        {
                            "index": 0,
                            "value": 2080505856,
                            "op": "SCMP_CMP_MASKED_EQ"
                        }
                    ]
                },
                {
                    "names": [
                        "arch_prctl"
                    ],
                    "action": "SCMP_ACT_ALLOW"
                },
                {
                    "names": [
                        "modify_ldt"
                    ],
                    "action": "SCMP_ACT_ALLOW"
                }
            ]
        }
    }
}lesser01 /var/lib/containers/storage # 
/sys/fs/cgroup/cpu$( cat ./overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/config.json | jq -Mr ".linux.cgroupsPath" )
-su: /sys/fs/cgroup/cpu/kubepods/burstable/podf44110a0ca540009109bfc32a7eb0baa/crio-5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2: No such file or directory
dmolik commented 5 years ago

hopefully that's useful...

giuseppe commented 5 years ago

thanks, that is helpful. I've added a patch that catches earlier the error in opening the cgroup directory, but that is not enough yet to address the issue. What distro and kernel are you using?

dmolik commented 5 years ago

Gentoo, kernel 5.0.7, openrc

dmolik commented 5 years ago

this probably isn't helpful, but when I set the runtime to runc the .linux.cgroupsPath is created

giuseppe commented 5 years ago

I've just merged some patches that let the CRI-O integration tests pass successfully (except for three tests that are dependent on a runc behaviour). There were no changes needed in the cgroup part though, I'll need to look at this separately

giuseppe commented 5 years ago

any hint on what is the quickest way to get access to your same environment? I've tried a vagrant machine for Gentoo but it seems to get stuck. If there is nothing easier, I'll try to go through the full installation

dmolik commented 5 years ago

MacOS as the base machine?

Another option is a VPS, I use Gentoo on https://linode.com

dmolik commented 5 years ago

I would say Alpine, but I don't think they have a cri-o package yet.

dmolik commented 5 years ago

I compiled patch #41 , and I'm getting this in the cri-o logs:

time="2019-04-18 11:09:30.353132930-04:00" level=info msg="Attempting to run pod sandbox with infra container: kube-system/kube-controller-manager-lesser01/POD" 
time="2019-04-18 11:09:30.353179614-04:00" level=debug msg="parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage]k8s.gcr.io/pause:3.1"" 
time="2019-04-18 11:09:30.353494208-04:00" level=debug msg="exporting opaque data as blob "sha256:da86e6ba6ca197bf6bc5e9d900febd906b133eaa4750e6bed647b0fbe50ed43e"" 
time="2019-04-18 11:09:30.375666983-04:00" level=debug msg="created pod sandbox "a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f"" 
time="2019-04-18 11:09:30.381432401-04:00" level=debug msg="pod sandbox "a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f" has work directory "/var/lib/containers/storage/overlay-containers/a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f/userdata"" 
time="2019-04-18 11:09:30.381475661-04:00" level=debug msg="pod sandbox "a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f" has run directory "/var/run/containers/storage/overlay-containers/a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f/userdata"" 
time="2019-04-18 11:09:30.390420579-04:00" level=debug msg="overlay: mount_data=lowerdir=/var/lib/containers/storage/overlay/l/27QPZN3VEWJOPGI5AP7BP23QCW,upperdir=/var/lib/containers/storage/overlay/be04d7df39ffd55703a4de15cdbc2ad75c8d9000df8c6bfba1cbe79169bb45e3/diff,workdir=/var/lib/containers/storage/overlay/be04d7df39ffd55703a4de15cdbc2ad75c8d9000df8c6bfba1cbe79169bb45e3/work" 
time="2019-04-18 11:09:30.390720062-04:00" level=debug msg="mounted container "a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f" at "/var/lib/containers/storage/overlay/be04d7df39ffd55703a4de15cdbc2ad75c8d9000df8c6bfba1cbe79169bb45e3/merged"" 
time="2019-04-18 11:09:30.391612056-04:00" level=debug msg="running conmon: /usr/libexec/crio/conmon" args=[--syslog -c a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f -u a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f -r /usr/bin/crun -b /var/run/containers/storage/overlay-containers/a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f/userdata -p /var/run/containers/storage/overlay-containers/a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f/userdata/pidfile -l /var/log/pods/kube-system_kube-controller-manager-lesser01_8dac7afa85d5212e4fa0be5103f31601/a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f.log --exit-dir /var/run/crio/exits --socket-dir-path /var/run/crio --log-level debug] 
time="2019-04-18 11:09:30.578157451-04:00" level=debug msg="Received container pid: 4629" 
error opening file '/run/crun/a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f/status': No such file or directory
giuseppe commented 5 years ago

do you have anything under /var/run/user/0/crun?

I've seen that issue in the past, it depends on XDG_RUNTIME_DIR that is not always set. I'll need to find a better way to address that. I don't like much the way runc does it as it detect whether runc is running in a user namespace, but probably there are no better alternatives to it

giuseppe commented 5 years ago

if you have anything under /var/run/user/0/crun then the best workaround for now is to ensure XDG_RUNTIME_DIR is not set at all for root

dmolik commented 5 years ago

Okay I'll double check

giuseppe commented 5 years ago

might be caused by 4966bb652adc350c9a43cb5931f2a5dc135b8580 that was recently merged

dmolik commented 5 years ago

so on this machine I don't have a /run/user dir

giuseppe commented 5 years ago

is XDG_RUNTIME_DIR set?

dmolik commented 5 years ago

doesn't look like it xargs --null --max-args=1 echo < /proc/2882/environ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin _OCI_SYNCPIPE=3 _OCI_STARTPIPE=4 XDG_RUNTIME_DIR= _LIBCONTAINER_CLONED_BINARY=1

giuseppe commented 5 years ago

thanks to check this out, what about the CRI-O process?

giuseppe commented 5 years ago

on my Linode Gentoo VM I don't see XDG_RUNTIME_DIR= set.

Could you revert 4966bb6 and see if that is the issue?

dmolik commented 5 years ago

okay the crio process has no XDG_RUNTIME_DIR var, and reverting 4966bb652adc350c9a43cb5931f2a5dc135b8580 didn't seem to help

dmolik commented 5 years ago

just pulled down master still unable to find /run/crun//status I ran the test suite

Making check in libocispec
make[1]: Entering directory '/root/crun/libocispec'
make[1]: Circular /root/crun/libocispec/tests/data <- /root/crun/libocispec/tests/data dependency dropped.
  GEN      public-submodule-commit
make  check-am
make[2]: Entering directory '/root/crun/libocispec'
make  check-TESTS
make[3]: Entering directory '/root/crun/libocispec'
make[4]: Entering directory '/root/crun/libocispec'
PASS: tests/test-1
PASS: tests/test-2
PASS: tests/test-3
PASS: tests/test-4
PASS: tests/test-5
PASS: tests/test-6
PASS: tests/test-7
PASS: tests/test-8
============================================================================
Testsuite summary for libocispec 0.1.1
============================================================================
# TOTAL: 8
# PASS:  8
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================
make[4]: Leaving directory '/root/crun/libocispec'
make[3]: Leaving directory '/root/crun/libocispec'
make[2]: Leaving directory '/root/crun/libocispec'
make[1]: Leaving directory '/root/crun/libocispec'
make[1]: Entering directory '/root/crun'
make  check-TESTS
make[2]: Entering directory '/root/crun'
make[3]: Entering directory '/root/crun'
PASS: tests/test_capabilities.py 1 - no-caps
PASS: tests/test_capabilities.py 2 - new-privs
PASS: tests/test_capabilities.py 3 - some-caps-bounding
PASS: tests/test_capabilities.py 4 - some-caps-inheritable
PASS: tests/test_capabilities.py 5 - some-caps-ambient
PASS: tests/test_capabilities.py 6 - some-caps-permitted
PASS: tests/test_capabilities.py 7 - some-caps-effective-non-root
PASS: tests/test_capabilities.py 8 - some-caps-bounding-non-root
PASS: tests/test_capabilities.py 9 - some-caps-inheritable-non-root
PASS: tests/test_capabilities.py 10 - some-caps-ambient-non-root
PASS: tests/test_capabilities.py 11 - some-caps-permitted-non-root
PASS: tests/test_cwd.py 1 - cwd
PASS: tests/test_devices.py 1 - deny-devices
PASS: tests/test_devices.py 2 - allow-device
PASS: tests/test_hostname.py 1 - hostname
PASS: tests/test_limits.py 1 - limit-pid-0
PASS: tests/test_limits.py 2 - limit-pid-n
SKIP: tests/test_mounts.py
PASS: tests/test_paths.py 1 - readonly-paths
PASS: tests/test_paths.py 2 - masked-paths
PASS: tests/test_pid.py 1 - pid
PASS: tests/test_pid.py 2 - pid-user
PASS: tests/test_pid_file.py 1 - test_pid_file
PASS: tests/test_preserve_fds.py 1 - preserve-fds-0
PASS: tests/test_preserve_fds.py 2 - preserve-fds-some
PASS: tests/test_uid_gid.py 1 - uid
PASS: tests/test_uid_gid.py 2 - gid
PASS: tests/test_rlimits.py 1 - rlimits
PASS: tests/test_tty.py 1 - test-stdin-tty
PASS: tests/test_tty.py 2 - test-stdout-tty
PASS: tests/test_tty.py 3 - test-stderr-tty
PASS: tests/test_tty.py 4 - test-detach-tty
PASS: tests/test_hooks.py 1 - test-fail-prestart
PASS: tests/test_hooks.py 2 - test-success-prestart
SKIP: tests/test_update.py 1 - test-update # SKIP
PASS: tests/test_detach.py 1 - test-detach
PASS: tests/test_resources.py 1 - resources-pid-limit
FAIL: tests/test_start.py 1 - start
PASS: tests/test_exec.py 1 - exec
PASS: tests/test_exec.py 2 - exec-not-exists
PASS: tests/test_exec.py 3 - exec-detach-not-exists
PASS: tests/tests_libcrun_utils 1 - test_crun_path_exists
PASS: tests/tests_libcrun_utils 2 - test_write_read_file
PASS: tests/tests_libcrun_utils 3 - test_run_process
PASS: tests/tests_libcrun_utils 4 - test_dir_p
PASS: tests/tests_libcrun_utils 5 - test_socket_pair
PASS: tests/tests_libcrun_utils 6 - test_send_receive_fd
PASS: tests/tests_libcrun_errors 1 - test_crun_make_error
PASS: tests/tests_libcrun_errors 2 - test_crun_write_warning_and_release
============================================================================
Testsuite summary for crun 0.4
============================================================================
# TOTAL: 49
# PASS:  46
# SKIP:  2
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See ./test-suite.log
Please report to giuseppe@scrivano.org
============================================================================
make[3]: *** [Makefile:1577: test-suite.log] Error 1
make[3]: Leaving directory '/root/crun'
make[2]: *** [Makefile:1685: check-TESTS] Error 2
make[2]: Leaving directory '/root/crun'
make[1]: *** [Makefile:1922: check-am] Error 2
make[1]: Leaving directory '/root/crun'
make: *** [Makefile:1462: check-recursive] Error 1
================================
   crun 0.4: ./test-suite.log
================================

# TOTAL: 49
# PASS:  46
# SKIP:  2
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

SKIP: tests/test_mounts
=======================

1..0
SKIP: tests/test_mounts.py

SKIP: tests/test_update
=======================

1..1
ok 1 - test-update #SKIP
SKIP: tests/test_update.py 1 - test-update # SKIP

FAIL: tests/test_start
======================

error opening file '/run/crun/test-tmp_w1kw99i/status': No such file or directory
a bytes-like object is required, not 'str'
1..1
not ok 1 - start
FAIL: tests/test_start.py 1 - start
giuseppe commented 5 years ago

I'll give it another attempt in the next days (I am quite sure it is some weird interaction of XDG_RUNTIME_DIR). In the meanwhile, I am working to something related and I've addressed some issues in the last days that are bringing CRI-O+crun closer to pass the Kubernetes e2e tests, progresses are tracked here:

https://github.com/cri-o/cri-o/pull/2239

(just Fedora for now, as the RHEL failures are expected for a missing package)

giuseppe commented 5 years ago

it seems it gets confused on Gentoo as there is no pids cgroup controller?

dmolik commented 5 years ago
mount |grep pids
pids on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
dmolik commented 5 years ago

test suite passes now

dmolik commented 5 years ago

crio conmon does have the XDG_RUNTIME variable but it's not set

giuseppe commented 5 years ago

I finally managed to pass the Kubernetes e2e tests with CRI-O and crun

dmolik commented 5 years ago

Coolio!

But, getting back to business. I compiled master this morning (eastern US) and I was still having the same status file not found error. So I searched for status in the Repo, and took a closer look at this function;

https://github.com/giuseppe/crun/blob/77836e488fb847e8d76264d8ad192698c4d5f482/src/libcrun/status.c#L32:L49

so on a whimsy I did a ls on / and there is was, the /crun folder.

giuseppe commented 5 years ago

good hint!

So I guess the issue is in the XDG_RUNTIME_DIR to be defined but empty. Is this patch making any difference?

https://github.com/giuseppe/crun/pull/44

dmolik commented 5 years ago

I need a couple of minutes to finish up another task, I'll check as soon as I can