egernst commented 6 years ago

Description of problem

When using QEMU >= v2.9, I see cc-runtime hang when trying to start a container. This is usually observed when launching and exiting a couple of containers in a row. Observed with both qemu-cc, a local build of v2.9 and a local build of v2.10

On a clean/freshly booted system, I see it fail after 6th time running sudo docker run alpine sh

At the point of the hang, cc-runtime is executing, as well as the resulting QEMU instance. Relevant logs from journal included below. Looks like it is hanging when runtime is handling hotplug/QMP.

Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info arguments="\"start 1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971\"" commit=a33e6bf56f7bfbb0497c08f4fd2f70dc06c9f816 name=cc-runtime source=runtime version=3.0.12
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="{\"QMP\": {\"version\": {\"qemu\": {\"micro\": 0, \"minor\": 9, \"major\": 2}, \"package\": \"\"}, \"capabilities\": []}}" source=virtcontainers subsystem=qmp
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="QMP details" qmp-capabilities= qmp-major-version=2 qmp-micro-version=0 qmp-minor-version=9 source=virtcontainers subsystem=qemu
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="{\"execute\":\"qmp_capabilities\"}" source=virtcontainers subsystem=qmp
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="{\"return\": {}}" source=virtcontainers subsystem=qmp
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="VM started" pod-id=1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971 source=virtcontainers subsystem=pod
Jan 09 11:50:38 eernstworkstation cc-proxy[5528]: time="2018-01-09T11:50:38.661342209-08:00" level=info msg="client connected" client=3 name=cc-proxy pid=5528 source=proxy
Jan 09 11:50:38 eernstworkstation cc-proxy[5528]: time="2018-01-09T11:50:38.661548505-08:00" level=info msg="AttachVM(containerId=1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971)" client=3 name=cc-proxy pid=5528 source=proxy
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="Virtual endpoint unmarshalled [{{2aad7e72-782b-46d5-bf25-e0ee3fb9403d br0 {eth0 02:00:ca:fe:00:00 [172.17.0.2/16 eth0]} {tap0 02:42:ac:11:00:02 []} 1 [0xc42000e218 0xc42000e220 0xc42000e228 0xc42000e230 0xc42000e238 0xc42000e240 0xc42000e248 0xc42000e250] [0xc42000e258 0xc42000e260 0xc42000e268 0xc42000e270 0xc42000e278 0xc42000e280 0xc42000e288 0xc42000e290]} {{{15 1500 0 eth0 02:42:ac:11:00:02 up|broadcast|multicast 69699 16 0 <nil>  0xc4200ae540 0 <nil> ether <nil> up 0 0 0} veth} [172.17.0.2/16 eth0] [{Ifindex: 15 Dst: <nil> Src: <nil> Gw: 172.17.0.1 Flags: [] Table: 254} {Ifindex: 15 Dst: 172.17.0.0/16 Src: 172.17.0.2 Gw: <nil> Flags: [] Table: 254}] {[]  [] []}} false virtual}]" source=virtcontainers
Jan 09 11:50:38 eernstworkstation cc-proxy[5528]: time="2018-01-09T11:50:38.664202493-08:00" level=info msg="hyper(cmd=\\\"startpod\\\", data=\\\"{\\\\\\\"hostname\\\\\\\":\\\\\\\"1d472f141546\\\\\\\",\\\\\\\"interfaces\\\\\\\":[{\\\\\\\"newDeviceName\\\\\\\":\\\\\\\"eth0\\\\\\\",\\\\\\\"ipAddresses\\\\\\\":[{\\\\\\\"ipAddress\\\\\\\":\\\\\\\"172.17.0.2\\\\\\\",\\\\\\\"netMask\\\\\\\":\\\\\\\"16\\\\\\\"}],\\\\\\\"mtu\\\\\\\":1500,\\\\\\\"macAddr\\\\\\\":\\\\\\\"02:42:ac:11:00:02\\\\\\\"}],\\\\\\\"routes\\\\\\\":[{\\\\\\\"dest\\\\\\\":\\\\\\\"\\\\\\\",\\\\\\\"gateway\\\\\\\":\\\\\\\"172.17.0.1\\\\\\\",\\\\\\\"device\\\\\\\":\\\\\\\"eth0\\\\\\\"},{\\\\\\\"dest\\\\\\\":\\\\\\\"172.17.0.0/16\\\\\\\",\\\\\\\"device\\\\\\\":\\\\\\\"eth0\\\\\\\"}],\\\\\\\"shareDir\\\\\\\":\\\\\\\"hyperShared\\\\\\\"}\\\")" client=3 name=cc-proxy pid=5528 source=proxy
Jan 09 11:50:38 eernstworkstation cc-proxy[5528]: time="2018-01-09T11:50:38.66742216-08:00" level=info msg="client connected" client=4 name=cc-proxy pid=5528 source=proxy
Jan 09 11:50:38 eernstworkstation cc-proxy[5528]: time="2018-01-09T11:50:38.667536239-08:00" level=info msg="AttachVM(containerId=1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971)" client=4 name=cc-proxy pid=5528 source=proxy
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="device details" container-id=1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971 device-major=253 device-minor=1 mount-point=/var/lib/docker/devicemapper/mnt/15149df06562532596a3a43d2b48027b666f84d16d9ca4fa0ca145ffa73cba88 pod-id=1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971 source=virtcontainers subsystem=container
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="Block device detected" container-id=1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971 device-path="/dev/mapper/docker-8:1-18612488-15149df06562532596a3a43d2b48027b666f84d16d9ca4fa0ca145ffa73cba88" fs-type=xfs pod-id=1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971 source=virtcontainers subsystem=container
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="{\"QMP\": {\"version\": {\"qemu\": {\"micro\": 0, \"minor\": 9, \"major\": 2}, \"package\": \"\"}, \"capabilities\": []}}" source=virtcontainers subsystem=qmp
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="{\"execute\":\"qmp_capabilities\"}" source=virtcontainers subsystem=qmp
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="{\"return\": {}}" source=virtcontainers subsystem=qmp
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="{\"arguments\":{\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/mapper/docker-8:1-18612488-15149df06562532596a3a43d2b48027b666f84d16d9ca4fa0ca145ffa73cba88\"},\"node-name\":\"drive-1d472f141546f6b034fd0dddc\"},\"execute\":\"blockdev-add\"}" source=virtcontainers subsystem=qmp
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="{\"return\": {}}" source=virtcontainers subsystem=qmp
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="{\"arguments\":{\"addr\":\"0x1\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-1d472f141546f6b034fd0dddc\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-1d472f141546f6b034fd0dddc\"},\"execute\":\"device_add\"}" source=virtcontainers subsystem=qmp
Jan 09 11:50:38 eernstworkstation cc-runtime[5536]: time="2018-01-09T11:50:38-08:00" level=info msg="{\"return\": {}}" source=virtcontainers subsystem=qmp
Jan 09 11:50:38 eernstworkstation cc-proxy[5528]: time="2018-01-09T11:50:38.686221641-08:00" level=info msg="hyper(cmd=\\\"newcontainer\\\", data=\\\"{\\\\\\\"id\\\\\\\":\\\\\\\"1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971\\\\\\\",\\\\\\\"rootfs\\\\\\\":\\\\\\\"rootfs\\\\\\\",\\\\\\\"fstype\\\\\\\":\\\\\\\"xfs\\\\\\\",\\\\\\\"image\\\\\\\":\\\\\\\"vda\\\\\\\",\\\\\\\"fsmap\\\\\\\":[{\\\\\\\"source\\\\\\\":\\\\\\\"1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971-28cd48f9f2cf199a-resolv.conf\\\\\\\",\\\\\\\"path\\\\\\\":\\\\\\\"/etc/resolv.conf\\\\\\\",\\\\\\\"readOnly\\\\\\\":false,\\\\\\\"dockerVolume\\\\\\\":false,\\\\\\\"absolutePath\\\\\\\":false},{\\\\\\\"source\\\\\\\":\\\\\\\"1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971-19bf32dfaed7f8e8-hostname\\\\\\\",\\\\\\\"path\\\\\\\":\\\\\\\"/etc/hostname\\\\\\\",\\\\\\\"readOnly\\\\\\\":false,\\\\\\\"dockerVolume\\\\\\\":false,\\\\\\\"absolutePath\\\\\\\":false},{\\\\\\\"source\\\\\\\":\\\\\\\"1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971-1e48e5a47a27a786-hosts\\\\\\\",\\\\\\\"path\\\\\\\":\\\\\\\"/etc/hosts\\\\\\\",\\\\\\\"readOnly\\\\\\\":false,\\\\\\\"dockerVolume\\\\\\\":false,\\\\\\\"absolutePath\\\\\\\":false}],\\\\\\\"process\\\\\\\":{\\\\\\\"user\\\\\\\":\\\\\\\"0\\\\\\\",\\\\\\\"group\\\\\\\":\\\\\\\"0\\\\\\\",\\\\\\\"additionalGroups\\\\\\\":[\\\\\\\"0\\\\\\\",\\\\\\\"1\\\\\\\",\\\\\\\"2\\\\\\\",\\\\\\\"3\\\\\\\",\\\\\\\"4\\\\\\\",\\\\\\\"6\\\\\\\",\\\\\\\"10\\\\\\\",\\\\\\\"11\\\\\\\",\\\\\\\"20\\\\\\\",\\\\\\\"26\\\\\\\",\\\\\\\"27\\\\\\\"],\\\\\\\"terminal\\\\\\\":false,\\\\\\\"args\\\\\\\":[\\\\\\\"sh\\\\\\\"],\\\\\\\"envs\\\\\\\":[{\\\\\\\"env\\\\\\\":\\\\\\\"PATH\\\\\\\",\\\\\\\"value\\\\\\\":\\\\\\\"/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\\\\\\\"},{\\\\\\\"env\\\\\\\":\\\\\\\"HOSTNAME\\\\\\\",\\\\\\\"value\\\\\\\":\\\\\\\"1d472f141546\\\\\\\"}],\\\\\\\"workdir\\\\\\\":\\\\\\\"/\\\\\\\",\\\\\\\"noNewPrivileges\\\\\\\":false},\\\\\\\"restartPolicy\\\\\\\":\\\\\\\"\\\\\\\",\\\\\\\"initialize\\\\\\\":false,\\\\\\\"systemMountsInfo\\\\\\\":{\\\\\\\"bindMountDev\\\\\\\":false,\\\\\\\"devShmSize\\\\\\\":0}}\\\")" client=4 name=cc-proxy pid=5528 source=proxy
Jan 09 11:50:38 eernstworkstation cc-proxy[5528]: time="2018-01-09T11:50:38.6869108-08:00" level=info msg="waiting for shim to register itself with token byASJnNaM4SjSubrc2cBbiLzZAfQ_KIfw90z6RfenTA= (timeout 30s)" name=cc-proxy pid=5528 section=io source=proxy vm=1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971
Jan 09 11:52:23 eernstworkstation cc-proxy[5528]: time="2018-01-09T11:52:23.966084445-08:00" level=debug msg="[  105.652643] systemd-journald[154]: Sent WATCHDOG=1 notification." name=cc-proxy pid=5528 source=qemu vm=1d472f141546f6b034fd0dddc126104d1a495144f7dc4cc4a4d37e9df82a5971

cc-env details;

./cc-runtime cc-env [Meta] Version = "1.0.6"

[Runtime] Debug = false [Runtime.Version] Semver = "3.0.12" Commit = "a33e6bf56f7bfbb0497c08f4fd2f70dc06c9f816" OCI = "1.0.1" [Runtime.Config] Path = "/etc/clear-containers/configuration.toml"

[Image] Path = "/usr/share/clear-containers/clear-19790-containers.img"

[Kernel] Path = "/usr/share/clear-containers/vmlinuz-4.9.60-82.container" Parameters = ""

[Proxy] Type = "ccProxy" Version = "Version: 3.0.12-bb85f104f886239198ece88552eb604c05e2c6dd" Path = "/usr/libexec/clear-containers/cc-proxy" Debug = true

[Shim] Type = "ccShim" Version = "shim version: 3.0.12 (commit: d01f9a71056fc005a98fbaeba733c7b3b07874f6)" Path = "/usr/libexec/clear-containers/cc-shim" Debug = true

[Agent] Type = "hyperstart" Version = "<>"

[Host] Kernel = "4.8.0-36-generic" CCCapable = true [Host.Distro] Name = "Ubuntu" Version = "16.04" [Host.CPU] Vendor = "GenuineIntel" Model = "Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz"

egernst commented 6 years ago

@devimc @amshinde @sboeuf @sameo @mcastelino

jodh-intel commented 6 years ago

Hi @egernst - could you run the command requested (cc-collect-data.sh) and just paste the output as a comment as that would show us more details, like you full config ;)

A few questions:

Do you see the same behaviour with, say, the busybox image?
Anything interesting in /proc/$cc_runtime_pid/stack / /proc/$cc_runtime_pid/fd/ ?
Did you build qemu? If so, can you provide a complete list of your qemu 2.9 configure and build options?

egernst commented 6 years ago

I did hand-build it for 2.9 and 2.10, as well as use qemu-cc from our packages. Same behavior with each of these three. Will work through rest of your queries in a bit.

jodh-intel commented 6 years ago

Could you also check on the state of the qemu process too (strace and/or those /proc/ bits above)?

jodh-intel commented 6 years ago

Also, can you enable debug for maximum firehosage [*]?

https://github.com/clearcontainers/runtime#debugging

[*] - warning! Potential new word detected! ;)

egernst commented 6 years ago

strace output for cc-runtime: /proc/22971# strace -fp 22971 strace: Process 22971 attached with 5 threads [pid 22972] restart_syscall(<... resuming interrupted futex ...> <unfinished ...> [pid 22975] epoll_wait(6, <unfinished ...> [pid 22974] futex(0xc42004cd10, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 22973] futex(0xc42004c910, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 22971] futex(0x10dc110, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 22972] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 22972] select(0, NULL, NULL, NULL, {0, 20}) = 0 (Timeout) [pid 22972] futex(0x10db958, FUTEX_WAIT, 0, {60, 0}) = -1 ETIMEDOUT (Connection timed out) [pid 22972] select(0, NULL, NULL, NULL, {0, 20}) = 0 (Timeout) [pid 22972] futex(0x10db958, FUTEX_WAIT, 0, {60, 0}) = -1 ETIMEDOUT (Connection timed out) [pid 22972] select(0, NULL, NULL, NULL, {0, 20}) = 0 (Timeout) [pid 22972] futex(0x10db958, FUTEX_WAIT, 0, {60, 0}

egernst commented 6 years ago

cc-runtime's stack at time of failure: /proc/22971# cat stack [] futex_wait_queue_me+0xc4/0x120 [] futex_wait+0x116/0x280 [] do_futex+0x214/0x540 [] SyS_futex+0x81/0x180 [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [] 0xffffffffffffffff

egernst commented 6 years ago

Yes, same error is observed on different workloads. Checked with busy box.

egernst commented 6 years ago

/proc/22971# ls -al fd/ total 0 dr-x------ 2 root root 0 Jan 10 09:19 . dr-xr-xr-x 9 root root 0 Jan 10 09:16 .. lr-x------ 1 root root 64 Jan 10 09:20 0 -> /dev/null l-wx------ 1 root root 64 Jan 10 09:20 1 -> pipe:[1597394] lrwx------ 1 root root 64 Jan 10 09:20 12 -> anon_inode:[eventfd] lrwx------ 1 root root 64 Jan 10 09:20 13 -> anon_inode:[eventfd] lrwx------ 1 root root 64 Jan 10 09:20 14 -> anon_inode:[eventfd] lrwx------ 1 root root 64 Jan 10 09:20 15 -> anon_inode:[eventfd] lrwx------ 1 root root 64 Jan 10 09:20 16 -> anon_inode:[eventfd] lrwx------ 1 root root 64 Jan 10 09:20 17 -> anon_inode:[eventfd] l-wx------ 1 root root 64 Jan 10 09:20 2 -> pipe:[1597394] lrwx------ 1 root root 64 Jan 10 09:20 3 -> socket:[45577] l-wx------ 1 root root 64 Jan 10 09:20 4 -> /dev/null lrwx------ 1 root root 64 Jan 10 09:20 5 -> socket:[1119945] lrwx------ 1 root root 64 Jan 10 09:20 6 -> anon_inode:[eventpoll] lr-x------ 1 root root 64 Jan 10 09:20 7 -> /run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/lock lrwx------ 1 root root 64 Jan 10 09:20 8 -> socket:[1609842] lrwx------ 1 root root 64 Jan 10 09:20 9 -> socket:[1611834]

egernst commented 6 years ago

QEMU FD seems sane/stable. Here's the stack:

/proc/22920# cat stack [] poll_schedule_timeout+0x44/0x70 [] do_sys_poll+0x4cb/0x590 [] SyS_ppoll+0x17d/0x1b0 [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [] 0xffffffffffffffff

grahamwhaley commented 6 years ago

looking at those fd's, my gut tells me maybe this is a 'pod lock' issue like we had last time - where one container will hold a lock open (for whatever reason), and then all the others get stuck behind it as they try to gain the lock to do a (reverse?) pod<->container lookup.

The trick then is to figure out why one has the lock held (last time it was a race...)

If we need, I can try and dig up the Issue where we diagnosed the lock issue last time at least. /cc @sboeuf for his memory on this.

egernst commented 6 years ago

Last login: Tue Jan 9 10:07:55 on ttys001 eernst-mac02:~ eernst$ defaults write com.apple.dock autohide-delay -int 0; eernst-mac02:~ eernst$ defaults write com.apple.dock autohide-time-modifier -float 1.0; eernst-mac02:~ eernst$ killall Dock eernst-mac02:~ eernst$ eernst-mac02:~ eernst$ eernst-mac02:~ eernst$ eernst-mac02:~ eernst$ Dock -bash: Dock: command not found eernst-mac02:~ eernst$ ssh eernstworkstation.jf.intel.com eernst@eernstworkstation.jf.intel.com's password: Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.8.0-36-generic x86_64)

Documentation: https://help.ubuntu.com
Management: https://landscape.canonical.com
Support: https://ubuntu.com/advantage

96 packages can be updated. 30 updates are security updates.

Last login: Tue Jan 9 14:30:06 2018 from 10.24.8.81 eernst@eernstworkstation:~$ tmux attach [detached (from session 0)] eernst@eernstworkstation:~$ tmux

[detached (from session 1)] eernst@eernstworkstation:~$ tmux attach -t 0

Meta details

Running cc-collect-data.sh version 3.0.10 (commit 3d402d1) at 2018-01-10.09:39:46.588538942.

Runtime is /usr/bin/cc-runtime.

`cc-env`

Output of "/usr/bin/cc-runtime cc-env":

[Meta]
  Version = "1.0.6"

[Runtime]
  Debug = false
  [Runtime.Version]
    Semver = "3.0.10"
    Commit = "3d402d1"
    OCI = "1.0.0-dev"
  [Runtime.Config]
    Path = "/usr/share/defaults/clear-containers/configuration.toml"

[Hypervisor]
  MachineType = "pc"
  Version = "QEMU emulator version 2.9.1 (v2.9.1-dirty)\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers"
  Path = "/home/eernst/go/src/github.com/clearcontainers/qemu/bin/debug/native-2.9/x86_64-softmmu/qemu-system-x86_64"
  Debug = false

[Image]
  Path = "/usr/share/clear-containers/clear-19790-containers.img"

[Kernel]
  Path = "/usr/share/clear-containers/osbuilder-vmlinuz"
  Parameters = ""

[Proxy]
  Type = "ccProxy"
  Version = "Version: 3.0.12-bb85f104f886239198ece88552eb604c05e2c6dd"
  Path = "/usr/libexec/clear-containers/cc-proxy"
  Debug = true

[Shim]
  Type = "ccShim"
  Version = "shim version: 3.0.12 (commit: d01f9a71056fc005a98fbaeba733c7b3b07874f6)"
  Path = "/usr/libexec/clear-containers/cc-shim"
  Debug = true

[Agent]
  Type = "hyperstart"
  Version = "<<unknown>>"

[Host]
  Kernel = "4.8.0-36-generic"
  CCCapable = true
  [Host.Distro]
    Name = "Ubuntu"
    Version = "16.04"
  [Host.CPU]
    Vendor = "GenuineIntel"
    Model = "Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz"

Runtime config files

Runtime default config files

/usr/share/defaults/clear-containers/configuration.toml
/usr/share/defaults/clear-containers/configuration.toml

Runtime config file contents

Output of "cat "/etc/clear-containers/configuration.toml"":

# XXX: WARNING: this config was generated by osbuilder on Tue Jan  2 13:47:06 PST 2018 from /usr/share/defaults/clear-containers/configuration.toml
# XXX: Warning: this file is auto-generated from file "config/configuration.toml.in".

[hypervisor.qemu]
#path = "/home/eernst/go/src/github.com/clearcontainers/qemu/bin/debug/native-2.9/x86_64-softmmu/qemu-system-x86_64"
#path = "/usr/bin/qemu-lite-system-x86_64"
path = "/usr/bin/qemu-cc-system-x86_64"
kernel = "/usr/share/clear-containers/vmlinuz.container"
# kernel= "/etc/clear-containers/osbuilder-vmlinuz"
image = "/usr/share/clear-containers/clear-containers.img"
#image = "/home/eernst/go/src/github.com/clearcontainers/agent/clear-19350-containers.img"
machine_type = "pc"
# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc
kernel_params = ""

# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""

# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""

# Default number of vCPUs per POD/VM:
# unspecified or 0 --> will be set to 1
# < 0              --> will be set to the actual number of physical cores
# > 0 <= 255       --> will be set to the specified number
# > 255            --> will be set to 255
default_vcpus = -1

# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
#   This limitation could be a bug in qemu or in the kernel
# Default number of bridges per POD/VM:
# unspecified or 0   --> will be set to 1
# > 1 <= 5           --> will be set to the specified number
# > 5                --> will be set to 5
default_bridges = 1

# Default memory size in MiB for POD/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048
disable_block_device_use = false

# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true

# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically 
# result in memory pre allocation
#enable_hugepages = true

# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true

# Debug changes the default hypervisor and kernel parameters to
# enable debug output where available.
# Default false
# these logs can be obtained in the cc-proxy logs  when the 
# proxy is set to run in debug mode
# /usr/libexec/clear-containers/cc-proxy -log debug
# or by stopping the cc-proxy service and running the cc-proxy 
# explicitly using the same command line
# 
enable_debug = true
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
# 
#disable_nesting_checks = true

[proxy.cc]
path = "/usr/libexec/clear-containers/cc-proxy"

# If enabled, proxy messages will be sent to the system log
# (default: disabled)
enable_debug = true

[shim.cc]
path = "/usr/libexec/clear-containers/cc-shim"

# If enabled, shim messages will be sent to the system log
# (default: disabled)
enable_debug = true

[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
enable_debug = true

Output of "cat "/usr/share/defaults/clear-containers/configuration.toml"":

# XXX: Warning: this file is auto-generated from file "config/configuration.toml.in".

[hypervisor.qemu]
#path = "/usr/bin/qemu-lite-system-x86_64"
#path = "/home/eernst/go/src/github.com/clearcontainers/qemu/bin/debug/native/x86_64-softmmu/qemu-system-x86_64"
path = "/home/eernst/go/src/github.com/clearcontainers/qemu/bin/debug/native-2.9/x86_64-softmmu/qemu-system-x86_64"
#kernel = "/usr/share/clear-containers/vmlinuz.container"
kernel = "/usr/share/clear-containers/osbuilder-vmlinuz"
image = "/usr/share/clear-containers/clear-containers.img"
#image = "/home/eernst/go/src/github.com/clearcontainers/agent/clear-19350-containers.img"
machine_type = "pc"
# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc
kernel_params = ""

# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""

# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""

# Default number of vCPUs per POD/VM:
# unspecified or 0 --> will be set to 1
# < 0              --> will be set to the actual number of physical cores
# > 0 <= 255       --> will be set to the specified number
# > 255            --> will be set to 255
default_vcpus = -1

# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
#   This limitation could be a bug in qemu or in the kernel
# Default number of bridges per POD/VM:
# unspecified or 0   --> will be set to 1
# > 1 <= 5           --> will be set to the specified number
# > 5                --> will be set to 5
default_bridges = 1

# Default memory size in MiB for POD/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048
disable_block_device_use = false

# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true

# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically 
# result in memory pre allocation
#enable_hugepages = true

# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true

# Debug changes the default hypervisor and kernel parameters to
# enable debug output where available.
# Default false
# these logs can be obtained in the cc-proxy logs  when the 
# proxy is set to run in debug mode
# /usr/libexec/clear-containers/cc-proxy -log debug
# or by stopping the cc-proxy service and running the cc-proxy 
# explicitly using the same command line
# 
enable_debug = true

# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
# 
#disable_nesting_checks = true

[proxy.cc]
path = "/usr/libexec/clear-containers/cc-proxy"

# If enabled, proxy messages will be sent to the system log
# (default: disabled)
enable_debug = true

[shim.cc]
path = "/usr/libexec/clear-containers/cc-shim"

# If enabled, shim messages will be sent to the system log
# (default: disabled)
enable_debug = true

[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
enable_debug = true

Logfiles

Runtime logs

Recent runtime problems found in system journal:

time="2018-01-10T09:16:19-08:00" level=info msg="launching qemu with: [-name pod-941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2 -uuid dc46372a-5c56-4b01-8737-235dd51cf346 -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/virtcontainers/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2/dc46372a-5c56-4b0,server,nowait -qmp unix:/run/virtcontainers/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2/dc46372a-5c56-4b0,server,nowait -m 2048M,slots=2,maxmem=97635M -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/virtcontainers/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-19790-containers.img,size=235929600 -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on -device virtserialport,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch0,path=/run/virtcontainers/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2/hyper.sock,server,nowait -device virtserialport,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -chardev socket,id=charch1,path=/run/virtcontainers/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2/tty.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-hyperShared,mount_tag=hyperShared -fsdev local,id=extra-9p-hyperShared,path=/run/hyper/shared/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /usr/share/clear-containers/vmlinuz-4.9.60-82.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 debug systemd.show_status=true systemd.log_level=debug init=/usr/lib/systemd/systemd systemd.unit=clear-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket ip=::::::941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2::off:: -smp 32,cores=32,threads=1,sockets=1]" source=virtcontainers subsystem=qmp
time="2018-01-10T09:16:20-08:00" level=warning msg="Could not umount" error="<nil>" host-path=/run/hyper/shared/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2-31ba72c61176a37d-resolv.conf source=virtcontainers subsystem=mount
time="2018-01-10T09:16:20-08:00" level=error msg="Container not running, impossible to signal the container" source=runtime
time="2018-01-10T09:16:20-08:00" level=error msg="Container ID (941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2) does not exist" source=runtime
time="2018-01-10T09:16:24-08:00" level=info msg="launching qemu with: [-name pod-9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d -uuid 2a8db3ba-9ba7-4d4f-bf5c-7af66fa64adf -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/virtcontainers/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d/2a8db3ba-9ba7-4d4,server,nowait -qmp unix:/run/virtcontainers/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d/2a8db3ba-9ba7-4d4,server,nowait -m 2048M,slots=2,maxmem=97635M -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/virtcontainers/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-19790-containers.img,size=235929600 -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on -device virtserialport,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch0,path=/run/virtcontainers/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d/hyper.sock,server,nowait -device virtserialport,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -chardev socket,id=charch1,path=/run/virtcontainers/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d/tty.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-hyperShared,mount_tag=hyperShared -fsdev local,id=extra-9p-hyperShared,path=/run/hyper/shared/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /usr/share/clear-containers/vmlinuz-4.9.60-82.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 debug systemd.show_status=true systemd.log_level=debug init=/usr/lib/systemd/systemd systemd.unit=clear-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket ip=::::::9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d::off:: -smp 32,cores=32,threads=1,sockets=1]" source=virtcontainers subsystem=qmp
time="2018-01-10T09:16:25-08:00" level=warning msg="Could not umount" error="<nil>" host-path=/run/hyper/shared/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d-7fb5288ac9103574-resolv.conf source=virtcontainers subsystem=mount
time="2018-01-10T09:16:25-08:00" level=error msg="Container not running, impossible to signal the container" source=runtime
time="2018-01-10T09:16:25-08:00" level=error msg="Container ID (9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d) does not exist" source=runtime
time="2018-01-10T09:16:28-08:00" level=info msg="launching qemu with: [-name pod-b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50 -uuid b50e2b33-a7b4-4d38-9c49-a6f597f7c919 -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/virtcontainers/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50/b50e2b33-a7b4-4d3,server,nowait -qmp unix:/run/virtcontainers/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50/b50e2b33-a7b4-4d3,server,nowait -m 2048M,slots=2,maxmem=97635M -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/virtcontainers/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-19790-containers.img,size=235929600 -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on -device virtserialport,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch0,path=/run/virtcontainers/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50/hyper.sock,server,nowait -device virtserialport,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -chardev socket,id=charch1,path=/run/virtcontainers/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50/tty.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-hyperShared,mount_tag=hyperShared -fsdev local,id=extra-9p-hyperShared,path=/run/hyper/shared/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /usr/share/clear-containers/vmlinuz-4.9.60-82.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 debug systemd.show_status=true systemd.log_level=debug init=/usr/lib/systemd/systemd systemd.unit=clear-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket ip=::::::b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50::off:: -smp 32,cores=32,threads=1,sockets=1]" source=virtcontainers subsystem=qmp
time="2018-01-10T09:16:29-08:00" level=warning msg="Could not umount" error="<nil>" host-path=/run/hyper/shared/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50-d18a2b0bd3bdd2bb-resolv.conf source=virtcontainers subsystem=mount
time="2018-01-10T09:16:29-08:00" level=error msg="Container not running, impossible to signal the container" source=runtime
time="2018-01-10T09:16:29-08:00" level=error msg="Container ID (b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50) does not exist" source=runtime
time="2018-01-10T09:16:32-08:00" level=info msg="launching qemu with: [-name pod-8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e -uuid 63ba32a6-5d3c-41ed-9541-f89acab8dc0e -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/63ba32a6-5d3c-41e,server,nowait -qmp unix:/run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/63ba32a6-5d3c-41e,server,nowait -m 2048M,slots=2,maxmem=97635M -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-19790-containers.img,size=235929600 -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on -device virtserialport,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch0,path=/run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/hyper.sock,server,nowait -device virtserialport,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -chardev socket,id=charch1,path=/run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/tty.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-hyperShared,mount_tag=hyperShared -fsdev local,id=extra-9p-hyperShared,path=/run/hyper/shared/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /usr/share/clear-containers/vmlinuz-4.9.60-82.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 debug systemd.show_status=true systemd.log_level=debug init=/usr/lib/systemd/systemd systemd.unit=clear-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket ip=::::::8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e::off:: -smp 32,cores=32,threads=1,sockets=1]" source=virtcontainers subsystem=qmp
time="2018-01-10T09:38:58-08:00" level=error msg="No multicast available for CTL channel" source=runtime
time="2018-01-10T09:38:58-08:00" level=error msg="No multicast available for CTL channel" source=runtime

Proxy logs

No recent proxy problems found in system journal.

Shim logs

Recent shim problems found in system journal:

time="2018-01-10T09:16:20.016506094-0800" level="debug" pid=22153 function="send_proxy_message" line=300 source="shim" name="cc-shim" msg="Sending frame of type:0, opcode:5, payload:(empty)"
time="2018-01-10T09:16:24.905300340-0800" level="debug" pid=22423 function="send_proxy_message" line=300 source="shim" name="cc-shim" msg="Sending frame of type:0, opcode:5, payload:(empty)"
time="2018-01-10T09:16:29.281754608-0800" level="debug" pid=22694 function="send_proxy_message" line=300 source="shim" name="cc-shim" msg="Sending frame of type:0, opcode:5, payload:(empty)"
time="2018-01-10T09:38:50.871602250-0800" level="warning" pid=22969 function="reconnect_to_proxy" line=1020 source="shim" name="cc-shim" msg="Reconnecting to cc-proxy (timeout 10 s)"
time="2018-01-10T09:38:50.972527238-0800" level="error" pid=22969 function="handle_proxy_response" line=616 source="shim" name="cc-shim" msg="Error response received from proxy at /run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/proxy.sock: {\"msg\":\"vm: unknown token y3SBtpsgVuGWp-Fw6oBFUGyVy3Ybqz8JLFxXGVJCdX8=\"}"

Container manager details

Have docker

Docker

Output of "docker info":

Containers: 106
 Running: 0
 Paused: 0
 Stopped: 106
Images: 2
Server Version: 17.09.0-ce
Storage Driver: devicemapper
 Pool Name: docker-8:1-18612488-pool
 Pool Blocksize: 65.54kB
 Base Device Size: 10.74GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 592MB
 Data Space Total: 107.4GB
 Data Space Available: 106.8GB
 Metadata Space Used: 2.503MB
 Metadata Space Total: 2.147GB
 Metadata Space Available: 2.145GB
 Thin Pool Minimum Free Space: 10.74GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.110 (2015-10-30)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: cc-runtime mine runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.8.0-36-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 94.35GiB
Name: eernstworkstation
ID: AMN3:EAHW:OBI4:6JX4:B2BI:I3IE:EL75:C3NC:PYTH:IEGB:33VZ:AUBG
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 19
 Goroutines: 27
 System Time: 2018-01-10T09:39:46.739064797-08:00
 EventsListeners: 0
Http Proxy: http://proxy-chain.intel.com:911/
Https Proxy: http://proxy-chain.intel.com:911/
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
         Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
WARNING: No swap limit support

Have kubectl

Kubernetes

Output of "kubectl config view":

apiVersion: v1
clusters: []
contexts: []
current-context: ""
kind: Config
preferences: {}
users: []

Packages

ii  cc-proxy                                   3.0.10+git.513b073-15                        amd64
ii  cc-runtime                                 3.0.10+git.3d402d1-15                        amd64
ii  cc-runtime-bin                             3.0.10+git.3d402d1-15                        amd64
ii  cc-runtime-config                          3.0.10+git.3d402d1-15                        amd64
ii  cc-shim                                    3.0.10+git.0952966-15                        amd64
ii  clear-containers-image                     19790-43                                     amd64        Clear containers image
ii  linux-container                            4.9.60-82                                    amd64        linux kernel optimised for container-like workloads.
ii  qemu-lite                                  2.7.1+git.d4a337fe91-9                       amd64        linux kernel optimised for container-like workloads.
ii  qemu-system-x86                            1:2.5+dfsg-5ubuntu10.16                      amd64        QEMU full system emulation binaries (x86)

jodh-intel commented 6 years ago

Yes, good call @grahamwhaley!

@egernst - one other thing - can you get a ps of what is running when docker (?) hangs.

egernst commented 6 years ago

@jodh-intel 👍 UID PID PPID C STIME TTY TIME CMD root 2977 1 0 Jan09 ? 00:02:20 /usr/bin/dockerd -D --add-runtime cc-runtime=/usr/bin/cc-runtime --add-runtime mine=/home/eernst/go/src/github.com/clearcontainers/runtime/cc-runtime --default-runtime=runc root 3187 2977 0 Jan09 ? 00:01:20 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc --debug root 24337 5913 0 09:46 pts/11 00:00:00 sudo docker run --runtime=mine busybox sh root 24338 24337 0 09:46 pts/11 00:00:00 docker run --runtime=mine busybox sh root 24403 2 0 09:46 ? 00:00:00 [kdmflush] root 24410 2 0 09:46 ? 00:00:00 [bioset] root 24413 2 0 09:46 ? 00:00:00 [xfs-buf/dm-1] root 24414 2 0 09:46 ? 00:00:00 [xfs-data/dm-1] root 24415 2 0 09:46 ? 00:00:00 [xfs-conv/dm-1] root 24416 2 0 09:46 ? 00:00:00 [xfs-cil/dm-1] root 24417 2 0 09:46 ? 00:00:00 [xfs-reclaim/dm-] root 24418 2 0 09:46 ? 00:00:00 [xfs-log/dm-1] root 24419 2 0 09:46 ? 00:00:00 [xfs-eofblocks/d] root 24420 2 0 09:46 ? 00:00:00 [xfsaild/dm-1] root 24450 3187 0 09:46 ? 00:00:00 docker-containerd-shim 859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f /var/run/docker/libcontainerd/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f /home/eernst/go/src/github.com/clearcontainers/runtime/cc-runtime root 24497 24450 8 09:46 ? 00:00:04 /usr/bin/qemu-cc-system-x86_64 -name pod-859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f -uuid a85ea297-2f2d-43a9-97e5-f36c9cc24c8a -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/a85ea297-2f2d-43a,server,nowait -qmp unix:/run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/a85ea297-2f2d-43a,server,nowait -m 2048M,slots=2,maxmem=97635M -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-19790-containers.img,size=235929600 -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on -device virtserialport,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch0,path=/run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/hyper.sock,server,nowait -device virtserialport,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -chardev socket,id=charch1,path=/run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/tty.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-hyperShared,mount_tag=hyperShared -fsdev local,id=extra-9p-hyperShared,path=/run/hyper/shared/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /usr/share/clear-containers/vmlinuz-4.9.60-82.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 debug systemd.show_status=true systemd.log_level=debug init=/usr/lib/systemd/systemd systemd.unit=clear-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket ip=::::::859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f::off:: -smp 32,cores=32,threads=1,sockets=1 root 24499 2 0 09:46 ? 00:00:00 [vhost-24497] root 24500 2 0 09:46 ? 00:00:00 [vhost-24497] root 24501 2 0 09:46 ? 00:00:00 [vhost-24497] root 24502 2 0 09:46 ? 00:00:00 [vhost-24497] root 24503 2 0 09:46 ? 00:00:00 [vhost-24497] root 24504 2 0 09:46 ? 00:00:00 [vhost-24497] root 24505 2 0 09:46 ? 00:00:00 [vhost-24497] root 24506 2 0 09:46 ? 00:00:00 [vhost-24497] root 24539 2 0 09:46 ? 00:00:00 [kvm-pit/24497] root 24540 24450 0 09:46 ? 00:00:00 /usr/libexec/clear-containers/cc-proxy -uri unix:///run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/proxy.sock -log debug root 24545 24450 0 09:46 ? 00:00:00 /usr/libexec/clear-containers/cc-shim -c 859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f -t cEKcTOMZfCMRkenhaSOwWapca2aHGhwXbRhcRbjui60= -u unix:///run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/proxy.sock -d root 24546 24545 0 09:46 ? 00:00:00 /usr/libexec/clear-containers/cc-shim -c 859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f -t cEKcTOMZfCMRkenhaSOwWapca2aHGhwXbRhcRbjui60= -u unix:///run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/proxy.sock -d root 24547 3187 0 09:46 ? 00:00:00 /home/eernst/go/src/github.com/clearcontainers/runtime/cc-runtime start 859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f eernst 24567 5896 0 09:46 pts/10 00:00:00 ps -aef ~

jodh-intel commented 6 years ago

That tells us it's blocked calling vc.StartContainer() fwics, but it's unclear whereabouts in that function it's blocking. Could you add some virtlog.Info("...") debug calls to ./vendor/github.com/containers/virtcontainers/api.go in the runtime, rebuild and install it and re-run to see what that gives?

egernst commented 6 years ago

@jodh-intel : correct, it is hanging during StartContainer. More specifically,

runtime is waiting when sending proxy command:

{newcontainer {f36c8b2c825eecaabce7688dc41
4d7c1f3ef7c1c1e42427c9bc746f80d3db479 rootfs xfs vda  [] [0xc420286f90 0xc420286fc0 0xc420286ff0] map[] 0xc42027af20  false {false 0}} qwefGyKiC8zubVXxEY7_M56VBclLdFPIby7m7ec3ri8=}

The agent, in turn, is hung at the mount syscall from newContainerCb -> mountContainerRootFs-> mount, @ https://github.com/clearcontainers/agent/blob/master/syscall.go#L63. It is called with the following parameters: :

source /dev/vda, dest /tmp/hyper/00c5bd0b5f507cdd4db1c8847e9ddbb5eae131f4c1b0d26aed9b74414079def2/root, type: xfs, flag: 0%!(EXTRA string=nouuid)

The parameters look the same (accept the unqiue container ID) when comparing passing and failing cases.

egernst commented 6 years ago

FYI, testing with qemu-lite, I don't see any failures after 100 iterations: for c in {1..100}; do sudo docker run --runtime=mine alpine sh; echo $c; done

jodh-intel commented 6 years ago

What happens if you don't use the devicemapper driver with Docker?
Can you get a list of mounts in the guest before the agent hangs?
Any guest kernel warnings/errors?
Is dockerd running in debug mode and if so has it logged anything warnings/errors in the journal?
You might want to use the code from https://github.com/clearcontainers/runtime/pull/658 to take a closer look at the agent environment.

Any thoughts @sboeuf? Once we've worked out what's going on here, it does sound like we should change the agent to not block indefinitely.

egernst commented 6 years ago

Hey @jodh-intel

Re: storage driver:

If I switch to an alternative (AUFS), I still see hangs - see later comment for details, as the signature appears slightly different (I stop seeing agent output at the time the PCI rescan is issued -- for device mapper it would happen later when trying to carry out the actual mount).

Re: mounts:

When using device mapper, I see nothing particularly useful in the guest kernel; I see the device got hot plugged, but I do not see the XFS mount success message that I would otherwise. Here is the output of the mount command at the time of failure (from the guest):

root@clrcont /tmp # mount
/dev/pmem0p1 on / type ext4 (rw,relatime,dax,errors=remount-ro,stripe=256,data=ordered)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=6486)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev)
tmpfs on /dev type tmpfs (ro,nosuid,noexec,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
mqueue on /dev/mqueue type mqueue (rw,relatime)
devtmpfs on /dev/hugepages type devtmpfs (rw,relatime,size=1023692k,nr_inodes=255923,mode=755)
hyperShared on /tmp/shareDir type 9p (rw,nodev,relatime,sync,dirsync,trans=virtio)

Docker debug

Docker is running in debug mode -- nothing useful here in journal

Regarding guest VM console

I have console working with the agent, which is how I was able to checkout dmesg. I do see the device via lsblk (that is, vda), and I see it in sysfs. The device node, /dev/vda, isn't present, presumably because it was moved into the mount namespace associated with cc-agent. I see it there when doing an nsenter (with mount provided in /proc//ns/mnt) with following details:

root@clrcont /proc/190/ns # ps -ae | grep cc-agent
  190 ?        00:00:01 cc-agent
root@clrcont /proc/190/ns # nsenter --mount=mnt
root@clrcont / # stat /dev/vda
  File: /dev/vda
  Size: 0               Blocks: 0          IO Block: 4096   block special file
Device: 6h/6d   Inode: 4260        Links: 1     Device type: fe,0
Access: (0600/brw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2018-01-12 22:28:19.215776443 +0000
Modify: 2018-01-12 22:28:19.212776443 +0000
Change: 2018-01-12 22:28:19.212776443 +0000
 Birth: -

Naively, I tried to execute the mount command I expect was intended inside the agent, but receive permissions issue:

root@clrcont / # mount -t xfs /dev/vda /tmp/hyper/3bf6a330d8e85133beb970d3878a82be345c718af2e1b122dfe0e9cf9aee4ca4/root
mount: /tmp/hyper/3bf6a330d8e85133beb970d3878a82be345c718af2e1b122dfe0e9cf9aee4ca4/root: permission denied.

egernst commented 6 years ago

Created a timeout of ~8 seconds for the mount command, after which we check to make sure /dev/vda and the target destination are still stat-able, and then return an error.

See: https://github.com/egernst/agent/blob/qemu-cc-debug/syscall.go#L67

As a next step will try to use alternatives to virtio-blk to mount the device, since we are seeing a different failure signature when using 9p...

egernst commented 6 years ago

Noticing that in the failure scenario udev is struggling with the particular device:

Jan 19 18:23:57 clrcont systemd[1]: Started D-Bus System Message Bus.
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: [1af4:1001] type 00 class 0x01
0000
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: reg 0x10: [io  0x0000-0x003f]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: reg 0x14: [mem 0x00000000-0x00
000fff]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: reg 0x20: [mem 0x00000000-0x00
003fff 64bit pref]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: BAR 4: assigned [mem 0xfe80000
0-0xfe803fff 64bit pref]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: BAR 1: assigned [mem 0xfe40000
0-0xfe400fff]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: BAR 0: assigned [io  0xc000-0x
c03f]
Jan 19 18:23:57 clrcont kernel: virtio-pci 0000:01:01.0: enabling device (0000 -
> 0003)
Jan 19 18:23:58 clrcont kernel: tsc: Refined TSC clocksource calibration: 2893.0
32 MHz
Jan 19 18:23:58 clrcont kernel: clocksource: tsc: mask: 0xffffffffffffffff max_c
ycles: 0x29b38cd2c56, max_idle_ns: 440795342158 ns
Jan 19 18:24:37 clrcont systemd-timesyncd[174]: Timed out waiting for reply from
 172.17.0.1:123 (gateway.).
Jan 19 18:24:58 clrcont systemd-udevd[158]: seq 1145 '/devices/pci0000:0
0/0000:00:03.0/0000:01:01.0/virtio3/block/vda' is taking a long time
Jan 19 18:25:20 clrcont systemd-timesyncd[174]: Timed out waiting for reply from
 172.17.0.1:123 (gateway.).
Jan 19 18:26:27 clrcont kernel: random: fast init done
Jan 19 18:26:34 clrcont systemd-timesyncd[174]: Timed out waiting for reply from
 172.17.0.1:123 (gateway.).
root@clrcont / # 
root@clrcont / # 
root@clrcont / # 
root@clrcont / # 
root@clrcont / # 
root@clrcont / # {"level":"info","msg":"Received udev event","name":"cc-agent","pid":199,"subsystem":"udevlistener","time":"2018-01-19T18:26:58.397072561Z","udev-event":"add","udev-path":"/sys/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda"}

As seen above, the udev event is received much later. After another couple of minutes, it appears udev "gives up":

Jan 19 18:23:57 clrcont systemd[1]: Started D-Bus System Message Bus.
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: [1af4:1001] type 00 class 0x01
0000
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: reg 0x10: [io  0x0000-0x003f]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: reg 0x14: [mem 0x00000000-0x00
000fff]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: reg 0x20: [mem 0x00000000-0x00
003fff 64bit pref]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: BAR 4: assigned [mem 0xfe80000
0-0xfe803fff 64bit pref]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: BAR 1: assigned [mem 0xfe40000
0-0xfe400fff]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: BAR 0: assigned [io  0xc000-0x
c03f]
Jan 19 18:23:57 clrcont kernel: virtio-pci 0000:01:01.0: enabling device (0000 -
> 0003)
Jan 19 18:23:58 clrcont kernel: tsc: Refined TSC clocksource calibration: 2893.0
32 MHz
Jan 19 18:23:58 clrcont kernel: clocksource: tsc: mask: 0xffffffffffffffff max_c
ycles: 0x29b38cd2c56, max_idle_ns: 440795342158 ns
Jan 19 18:24:37 clrcont systemd-timesyncd[174]: Timed out waiting for reply from
 172.17.0.1:123 (gateway.).
Jan 19 18:24:58 clrcont systemd-udevd[158]: seq 1145 '/devices/pci0000:0
0/0000:00:03.0/0000:01:01.0/virtio3/block/vda' is taking a long time
Jan 19 18:25:20 clrcont systemd-timesyncd[174]: Timed out waiting for reply from
 172.17.0.1:123 (gateway.).
Jan 19 18:26:27 clrcont kernel: random: fast init done
Jan 19 18:26:34 clrcont systemd-timesyncd[174]: Timed out waiting for reply from
 172.17.0.1:123 (gateway.).
Jan 19 18:26:58 clrcont systemd-udevd[158]: seq 1145 '/devices/pci0000:0
0/0000:00:03.0/0000:01:01.0/virtio3/block/vda' killed
Jan 19 18:26:58 clrcont systemd-udevd[158]: worker [225] terminated by s
ignal 9 (KILL)
Jan 19 18:26:58 clrcont systemd-udevd[158]: worker [225] failed while ha
ndling '/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda'

All of the udevd output from the failure:

# journalctl | grep udev
Jan 19 18:38:40 clrcont systemd-udevd[158]: Network interface NamePolicy= disabled on kernel command line, ignoring.
Jan 19 18:38:40 clrcont systemd[1]: Started udev Coldplug all Devices.
Jan 19 18:38:40 clrcont systemd-udevd[178]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jan 19 18:38:40 clrcont systemd-udevd[186]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jan 19 18:38:40 clrcont systemd-udevd[178]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jan 19 18:39:41 clrcont systemd-udevd[158]: seq 1145 '/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda' is taking a long time
Jan 19 18:41:41 clrcont systemd-udevd[158]: seq 1145 '/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda' killed
Jan 19 18:41:41 clrcont systemd-udevd[158]: worker [225] terminated by signal 9 (KILL)
Jan 19 18:41:41 clrcont systemd-udevd[158]: worker [225] failed while handling '/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda'

Output from udevadm info --export-db after the failure:

--
E: PCI_SLOT_NAME=0000:01:01.0
E: PCI_SUBSYS_ID=1AF4:0002
E: SUBSYSTEM=pci

P: /devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3
E: DEVPATH=/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3
E: DRIVER=virtio_blk
E: MODALIAS=virtio:d00000002v00001AF4
E: SUBSYSTEM=virtio

P: /devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda
N: vda
E: DEVNAME=/dev/vda
E: DEVPATH=/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda
E: DEVTYPE=disk
E: MAJOR=254
E: MINOR=0
E: SUBSYSTEM=block

egernst commented 6 years ago

Using strace and ftrace, we see that the mount is called, but not returning (information we already knew from the prints in the log). You can see that via the trace file and the fact that it does not include an entry for sys_exit_mount.

root@clrcont / # cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1   #P:8
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
        cc-agent-207   [001] ....     5.418645: sys_mount(dev_name: c42010a180, dir_name: c42015e0c0, type: c42010a18c, flags: 0, data: c42010a098)
root@clrcont / # cat strace* | grep mount
mount("hyperShared", "/tmp/shareDir", "9p", MS_MGC_VAL|MS_NODEV, "trans=virtio") = 0
mount("/dev/vda", "/tmp/hyper/435835da0d249ad87c4e01683848962d1d70b63bf9c1f0fd2a365f906c7cb6f2/root", "xfs", 0, "nouuid"set_robust_list(0x7fac437fe9e0, 24)     = 0

ftrace was enabled in the kernel for syscalls (specifically mount), and strace was enabled in the clear containers service by copying strace to the container image and modifying the ExecStart to be: /usr/bin/strace -o strace.log -vv -ff /usr/bin/cc-agent

rarindam commented 6 years ago

Maybe this is relevant: https://github.com/systemd/systemd/issues/3446

egernst commented 6 years ago

An additional data point: when moving from virtio-blk to virtio-scsi, this issue no longer manifests

sboeuf commented 6 years ago

@egernst that's a good news at least. Don't you think we should not spend more time on this issue since virtio-scsi will cover all cases handled by virtio-blk ?

egernst commented 6 years ago

@sboeuf -- no this still concerns me, especially if there are performance tradeoffs between -SCSI and -blk. I'd rather know the exact issue.

Perhaps the severity of the issue is slightly reduced, though...

sboeuf commented 6 years ago

@egernst ok makes sense. Do you know if there are actual performances trade-offs, or is it a potential outcome ?

egernst commented 6 years ago

This will likely be an issue with Kata runtime -- will test and open new issue there.

sboeuf commented 6 years ago

@rarindam are you still investigating on this issue ? I'd like to make sure we have a clear description of the issue on the qemu repository so that Qemu folks can really look into this. I don't think clearcontainers repo is a good place to continue discussing about this. Please let me know how we can make this more visible in order to get more help and try to fix it ;)

rarindam commented 6 years ago

Hi Sebastien, I am not actively debugging this issue. We can have it in our qemu JIRA to track this as a bug. I will look into it agian tomorrow and update this issue. Arindam

On Feb 20, 2018, at 5:40 PM, Sebastien Boeuf notifications@github.com<mailto:notifications@github.com> wrote:

@rarindamhttps://github.com/rarindam are you still investigating on this issue ? I'd like to make sure we have a clear description of the issue on the qemu repository so that Qemu folks can really look into this. I don't think clearcontainers repo is a good place to continue discussing about this. Please let me know how we can make this more visible in order to get more help and try to fix it ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/clearcontainers/runtime/issues/910#issuecomment-367184765, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AbyHQwNIampvqt_WLE55X1ImWnLhZMMyks5tW3P7gaJpZM4RYWlp.

sboeuf commented 6 years ago

Thanks @rarindam, I think your Qemu JIRA is fine if this is the way you're used to track your issues. I think it's important to get this issue properly described and tracked as we expect (Clear/Kata Containers teams) to get a fix eventually.

rarindam commented 6 years ago

Hi @egernst , I tried reproducing the issue today, on Qemu 2.10, cc-runtime 3.0.18. I could see the issue happening. But there are other issues also I see in my setup. So instead of trying to reproduce the issue in a container environment, I am going to do it in a simpler host/guest environment. Just wanted to confirm from you whether the following seems same your scenario:

qemu: 2.9 or higher
root-fs: mounted using virtio-blk.
Guest Kernel 4.9

rarindam commented 6 years ago

Being tracked internally in VIRTUALIZB-153

sboeuf commented 6 years ago

@rarindam Thanks for tracking this internally. Is that possible to get access to this issue ?

egernst commented 6 years ago

@rarindam -- 1 and 3, yes. For 2: it is actually the rootfs for a container which is mounted using virtio-blk (passed in via QMP).

sboeuf commented 6 years ago

Hi @rarindam, just a reminder you never gave me access to the internal JIRA on this issue so that I can follow the progress ;)

stefanha commented 6 years ago

@egernst Please check the kernel stack trace of the thread stuck in mount(2): # cat /proc/$AGENT_PID/stack where $AGENT_PID is the thread in question.

egernst commented 6 years ago

@stefanha

root@clrcont / # ps -ae | grep agent 153 ? 00:00:00 cc-agent root@clrcont / # cat /proc/153/stack [] futex_wait_queue_me+0xba/0x110 [] futex_wait+0xe6/0x220 [] do_futex+0x109/0x4d0 [] SyS_futex+0x68/0x140 [] do_syscall_64+0x81/0x340 [] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [] 0xffffffffffffffff

stefanha commented 6 years ago

@egernst That looks like the agent's main thread, not the thread that is stuck in mount(2). Use ps -eL -o lwp,stat,wchan,args to list all threads. The status of the thread in mount is probably starts with "S". Its /proc/$PID/stack output should contain SyS_mount.

If udevd hasn't timed out yet, it would also be interesting to see where it is stuck. You can use the same approach with /proc/$PID/stack. Systemd udevd forks a worker process to either execute programs or run built-in actions (see https://github.com/systemd/systemd/blob/master/src/udev/udevd.c#L328).

rarindam commented 6 years ago

Hi @sboeuf , are you able to access this link? -link- You need to have access to this JIRA repo....

sboeuf commented 6 years ago

@rarindam I have removed the link since we're not supposed to share internal links on external repos. Please send me an email with the details including the link and the way to apply to get access to it. Thanks

rarindam commented 6 years ago

I tried to run git bisect inbetween qemu tags 2.7.0 and 2.9.0. I hit a snag when using 2.8.0 and above versions...it seems there were some changes to QMP, and the command for adding the block dev was failing in these versions... Continuing to debug the same.

sboeuf commented 6 years ago

Thanks for the update @rarindam

amshinde commented 6 years ago

@rarindam There was a change introduced in qemu 2.9 for the blockdev-add command. I dont know if you are hitting that :

qemu 2.9 and above:
{"execute":"blockdev-add","arguments":{"driver":"raw","node-name":"disk1","file":{"driver":"file","filename":"/dev/mapper/docker-8:2-3016900"}}}

qemu < 2.9:
{"execute":"blockdev-add","arguments":{"options": {"driver":"raw","id":"disk1","file":{"driver":"file","filename":"/dev/mapper/docker-8:2-3016900"}}}}

amshinde commented 6 years ago

This issue cannot be reproduced with the current version of cc-runtime(3.23). I am able to reproduce this with an older version - 3.10. With the latest versions, we have introduced some gap between the time the drive is attached and mounted, this is likely the reason why we are not hitting the issue with mount system call not returning. Note the issue still exists in qemu, we are just not seeing in it the latest cc-runtime versions.

rarindam commented 6 years ago

@amshinde So this issue wont be seen in earlier versions of qemu, if you introduced the time gap there too?

clearcontainers / runtime

Runtime hangs after cycling instances with qemu-cc #910

Description of problem

Meta details

`cc-env`

Runtime config files

Runtime default config files

Runtime config file contents

Logfiles

Runtime logs

Proxy logs

Shim logs

Container manager details

Docker

Kubernetes

Packages

Re: storage driver:

Re: mounts:

Docker debug

Regarding guest VM console