containers / virtcontainers

A Go package for building hardware virtualized container runtimes
Apache License 2.0
139 stars 43 forks source link

error not propagating back to crio and container left in unknown state #623

Closed egernst closed 6 years ago

egernst commented 6 years ago

With CRIO+cc-runtime, if you start a container workload with a garbage command to execute, the exact error is not passed back to crictl and the container is left in an unknown state. Errors are observed at container start time rather than creation time (as is observed when using runc) -- this isn't really an issue; the issue is that the container is reported in unknown state, and from perspective of CRIO cannot be removed. On top of this, it appears that the error itself isn't propagated back up to crictl, just a generic "error - status 1"

To easily reproduce, grab sample pod/container json:

wget https://gist.githubusercontent.com/egernst/1e116abd0da65b3b090c09c81b70d4c6/raw/42af4a7595292c5be38e8e9c7ec6643e79adba5f/container_bad.json
wget  https://gist.githubusercontent.com/egernst/1e116abd0da65b3b090c09c81b70d4c6/raw/42af4a7595292c5be38e8e9c7ec6643e79adba5f/sandbox_config.json

Expected Behavior, per runc:

When running with runc, the failure occurs at the create container time, and cleanup occurs without issue:

$ sudo crictl runs ./sandbox_config.json
793db53cfaf13496e29f6eabed58108e045296bdd75222c78a3e74ac2b8b03a2
 $ sudo crictl create 793db53cfaf13496e29f6eabed58108e045296bdd75222c78a3e74ac2b8b03a2 container_bad.json sandbox_config.json
FATA[0000] Creating container failed: rpc error: code = Unknown desc = container create failed: container_linux.go:274: starting container process caused "exec: \"blablab\": executable file not found in $PATH"

Issue observed when using cc-runtime:

$ sudo crictl runs ./sandbox_config.json
be4082cc91d12d41640358557ee5f0668fda08cb6fe6bfe63c3cf6d58500a995

$ sudo crictl create be4082cc91d12d41640358557ee5f0668fda08cb6fe6bfe63c3cf6d58500a995 container_bad.json sandbox_config.json
a81f2b93b54656a76d598ddb83baa56a73102640a641714fe2b949be4336fa1d

$ sudo crictl start a81f2b93b5465
FATA[0000] Starting the container failed: rpc error: code = Unknown desc = failed to start container a81f2b93b54656a76d598ddb83baa56a73102640a641714fe2b949be4336fa1d: `/usr/local/bin/cc-runtime start a81f2b93b54656a76d598ddb8

Looking @ journal I see some of the error messages from proxy which would've been reported to crictl in the runc case:

Feb 14 11:39:51 s2rf6n2 cc-runtime[23028]: time="2018-02-14T11:39:51-08:00" level=info arguments="\"start a81f2b93b54656a76d598ddb83baa56a73102640a641714fe2b949be4
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.25119435-08:00" level=info msg="client connected" client=11 name=cc-proxy pid=22850 source=proxy
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.251288373-08:00" level=info msg="AttachVM(containerId=be4082cc91d12d41640358557ee5f0668fda08cb6f
Feb 14 11:39:51 s2rf6n2 cc-runtime[23028]: time="2018-02-14T11:39:51-08:00" level=info msg="device details" container-id=a81f2b93b54656a76d598ddb83baa56a73102640a6
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.252497298-08:00" level=info msg="hyper(cmd=\\\"newcontainer\\\", data=\\\"{\\\\\\\"id\\\\\\\":\\
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.252827191-08:00" level=info msg="waiting for shim to register itself with token SzjGXNcQMgJquF-D
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.253311115-08:00" level=debug msg="{\\\"channel\\\":\\\"ctl\\\",\\\"command\\\":\\\"newcontainer\
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.253419137-08:00" level=debug msg="{\\\"level\\\":\\\"info\\\",\\\"msg\\\":\\\"newcontainer_start
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.259793636-08:00" level=debug msg="[   71.524799] pci 0000:00:03.0: PCI bridge to [bus 01]" name=
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.259892346-08:00" level=debug msg="[   71.524899] pci 0000:00:03.0:   bridge window [io  0xc000-0
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.262447405-08:00" level=debug msg="[   71.527423] pci 0000:00:03.0:   bridge window [mem 0xfe8000
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.264375692-08:00" level=debug msg="[   71.529155] pci 0000:00:03.0:   bridge window [mem 0xfe0000
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.328231816-08:00" level=debug msg="{\\\"level\\\":\\\"warning\\\",\\\"msg\\\":\\\"os: process alr
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.328340899-08:00" level=debug msg="{\\\"container-pid\\\":\\\"5\\\",\\\"error\\\":\\\"container_l
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.328436888-08:00" level=debug msg="{\\\"container\\\":\\\"a81f2b93b54656a76d598ddb83baa56a7310264
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.328492834-08:00" level=debug msg="{\\\"container\\\":\\\"a81f2b93b54656a76d598ddb83baa56a7310264
Feb 14 11:39:51 s2rf6n2 cc-proxy[22850]: time="2018-02-14T11:39:51.328545297-08:00" level=debug msg="{\\\"level\\\":\\\"info\\\",\\\"msg\\\":\\\"newcontainer_end\\
Feb 14 11:39:51 s2rf6n2 crio[22515]: ERROR received from VM agent, control msg received 

At this point, ps shows that the container is in an unknown state:

$ sudo crictl ps
CONTAINER ID        IMAGE                       CREATED             STATE               NAME                      ATTEMPT
a81f2b93b5465       docker.io/library/busybox   3 minutes ago       CONTAINER_UNKNOWN   my-busybox-container-cc   0

And now we cannot remove the container, and thus cannot remove the sandbox either:

$ sudo crictl stop a81f2b93b5465
a81f2b93b5465
$ sudo crictl rm a81f2b93b5465
FATA[0000] Removing the container failed: rpc error: code = Unknown desc = failed to delete container a81f2b93b54656a76d598ddb83baa56a73102640a641714fe2b949be4336fa1d: `/usr/local/bin/cc-runtime delete --force a81f2b93b54656a76d598ddb83baa56a73102640a641714fe2b949be4336fa1d` failed: open /var/lib/virtcontainers/pods/be4082cc91d12d41640358557ee5f0668fda08cb6fe6bfe63c3cf6d58500a995/a81f2b93b54656a76d598ddb83baa56a73102640a641714fe2b949be4336fa1d/config.json: no such file or directory
  (exit status 1)
jodh-intel commented 6 years ago

This looks like a bug in crio. If you force an error running the runtime standalone...

... you get the expected:

$ sudo cc-runtime run --bundle "$bundle" baz >stdout 2>stderr
$ cat stdout
$ cat stderr
ERROR received from VM agent, control msg received : Process could not be started: container_linux.go:296: starting container process caused "exec: \"blablab\": executable file not found in $PATH"

The problem appears to be here....

... and here:

meaning the runtimes error message goes to the system journal. That appears to be happening, but the journal entries make it appear the error is coming from the crio daemon itself, which is clearly incorrect.