Closed egernst closed 5 years ago
@devimc @amshinde @sboeuf @sameo @mcastelino
Hi @egernst - could you run the command requested (cc-collect-data.sh
) and just paste the output as a comment as that would show us more details, like you full config ;)
A few questions:
busybox
image?/proc/$cc_runtime_pid/stack
/ /proc/$cc_runtime_pid/fd/
?I did hand-build it for 2.9 and 2.10, as well as use qemu-cc from our packages. Same behavior with each of these three. Will work through rest of your queries in a bit.
Could you also check on the state of the qemu process too (strace
and/or those /proc/
bits above)?
Also, can you enable debug for maximum firehosage [*]?
[*] - warning! Potential new word detected! ;)
strace output for cc-runtime: /proc/22971# strace -fp 22971 strace: Process 22971 attached with 5 threads [pid 22972] restart_syscall(<... resuming interrupted futex ...> <unfinished ...> [pid 22975] epoll_wait(6, <unfinished ...> [pid 22974] futex(0xc42004cd10, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 22973] futex(0xc42004c910, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 22971] futex(0x10dc110, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 22972] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 22972] select(0, NULL, NULL, NULL, {0, 20}) = 0 (Timeout) [pid 22972] futex(0x10db958, FUTEX_WAIT, 0, {60, 0}) = -1 ETIMEDOUT (Connection timed out) [pid 22972] select(0, NULL, NULL, NULL, {0, 20}) = 0 (Timeout) [pid 22972] futex(0x10db958, FUTEX_WAIT, 0, {60, 0}) = -1 ETIMEDOUT (Connection timed out) [pid 22972] select(0, NULL, NULL, NULL, {0, 20}) = 0 (Timeout) [pid 22972] futex(0x10db958, FUTEX_WAIT, 0, {60, 0}
cc-runtime's stack at time of failure:
/proc/22971# cat stack
[
Yes, same error is observed on different workloads. Checked with busy box.
/proc/22971# ls -al fd/ total 0 dr-x------ 2 root root 0 Jan 10 09:19 . dr-xr-xr-x 9 root root 0 Jan 10 09:16 .. lr-x------ 1 root root 64 Jan 10 09:20 0 -> /dev/null l-wx------ 1 root root 64 Jan 10 09:20 1 -> pipe:[1597394] lrwx------ 1 root root 64 Jan 10 09:20 12 -> anon_inode:[eventfd] lrwx------ 1 root root 64 Jan 10 09:20 13 -> anon_inode:[eventfd] lrwx------ 1 root root 64 Jan 10 09:20 14 -> anon_inode:[eventfd] lrwx------ 1 root root 64 Jan 10 09:20 15 -> anon_inode:[eventfd] lrwx------ 1 root root 64 Jan 10 09:20 16 -> anon_inode:[eventfd] lrwx------ 1 root root 64 Jan 10 09:20 17 -> anon_inode:[eventfd] l-wx------ 1 root root 64 Jan 10 09:20 2 -> pipe:[1597394] lrwx------ 1 root root 64 Jan 10 09:20 3 -> socket:[45577] l-wx------ 1 root root 64 Jan 10 09:20 4 -> /dev/null lrwx------ 1 root root 64 Jan 10 09:20 5 -> socket:[1119945] lrwx------ 1 root root 64 Jan 10 09:20 6 -> anon_inode:[eventpoll] lr-x------ 1 root root 64 Jan 10 09:20 7 -> /run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/lock lrwx------ 1 root root 64 Jan 10 09:20 8 -> socket:[1609842] lrwx------ 1 root root 64 Jan 10 09:20 9 -> socket:[1611834]
QEMU FD seems sane/stable. Here's the stack:
/proc/22920# cat stack
[
looking at those fd's, my gut tells me maybe this is a 'pod lock' issue like we had last time - where one container will hold a lock open (for whatever reason), and then all the others get stuck behind it as they try to gain the lock to do a (reverse?) pod<->container lookup.
The trick then is to figure out why one has the lock held (last time it was a race...)
If we need, I can try and dig up the Issue where we diagnosed the lock issue last time at least. /cc @sboeuf for his memory on this.
Last login: Tue Jan 9 10:07:55 on ttys001 eernst-mac02:~ eernst$ defaults write com.apple.dock autohide-delay -int 0; eernst-mac02:~ eernst$ defaults write com.apple.dock autohide-time-modifier -float 1.0; eernst-mac02:~ eernst$ killall Dock eernst-mac02:~ eernst$ eernst-mac02:~ eernst$ eernst-mac02:~ eernst$ eernst-mac02:~ eernst$ Dock -bash: Dock: command not found eernst-mac02:~ eernst$ ssh eernstworkstation.jf.intel.com eernst@eernstworkstation.jf.intel.com's password: Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.8.0-36-generic x86_64)
96 packages can be updated. 30 updates are security updates.
Last login: Tue Jan 9 14:30:06 2018 from 10.24.8.81 eernst@eernstworkstation:~$ tmux attach [detached (from session 0)] eernst@eernstworkstation:~$ tmux
[detached (from session 1)] eernst@eernstworkstation:~$ tmux attach -t 0
Running cc-collect-data.sh
version 3.0.10 (commit 3d402d1)
at 2018-01-10.09:39:46.588538942
.
Runtime is /usr/bin/cc-runtime
.
cc-env
Output of "/usr/bin/cc-runtime cc-env
":
[Meta]
Version = "1.0.6"
[Runtime]
Debug = false
[Runtime.Version]
Semver = "3.0.10"
Commit = "3d402d1"
OCI = "1.0.0-dev"
[Runtime.Config]
Path = "/usr/share/defaults/clear-containers/configuration.toml"
[Hypervisor]
MachineType = "pc"
Version = "QEMU emulator version 2.9.1 (v2.9.1-dirty)\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers"
Path = "/home/eernst/go/src/github.com/clearcontainers/qemu/bin/debug/native-2.9/x86_64-softmmu/qemu-system-x86_64"
Debug = false
[Image]
Path = "/usr/share/clear-containers/clear-19790-containers.img"
[Kernel]
Path = "/usr/share/clear-containers/osbuilder-vmlinuz"
Parameters = ""
[Proxy]
Type = "ccProxy"
Version = "Version: 3.0.12-bb85f104f886239198ece88552eb604c05e2c6dd"
Path = "/usr/libexec/clear-containers/cc-proxy"
Debug = true
[Shim]
Type = "ccShim"
Version = "shim version: 3.0.12 (commit: d01f9a71056fc005a98fbaeba733c7b3b07874f6)"
Path = "/usr/libexec/clear-containers/cc-shim"
Debug = true
[Agent]
Type = "hyperstart"
Version = "<<unknown>>"
[Host]
Kernel = "4.8.0-36-generic"
CCCapable = true
[Host.Distro]
Name = "Ubuntu"
Version = "16.04"
[Host.CPU]
Vendor = "GenuineIntel"
Model = "Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz"
/usr/share/defaults/clear-containers/configuration.toml
/usr/share/defaults/clear-containers/configuration.toml
Output of "cat "/etc/clear-containers/configuration.toml"
":
# XXX: WARNING: this config was generated by osbuilder on Tue Jan 2 13:47:06 PST 2018 from /usr/share/defaults/clear-containers/configuration.toml
# XXX: Warning: this file is auto-generated from file "config/configuration.toml.in".
[hypervisor.qemu]
#path = "/home/eernst/go/src/github.com/clearcontainers/qemu/bin/debug/native-2.9/x86_64-softmmu/qemu-system-x86_64"
#path = "/usr/bin/qemu-lite-system-x86_64"
path = "/usr/bin/qemu-cc-system-x86_64"
kernel = "/usr/share/clear-containers/vmlinuz.container"
# kernel= "/etc/clear-containers/osbuilder-vmlinuz"
image = "/usr/share/clear-containers/clear-containers.img"
#image = "/home/eernst/go/src/github.com/clearcontainers/agent/clear-19350-containers.img"
machine_type = "pc"
# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc
kernel_params = ""
# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""
# Default number of vCPUs per POD/VM:
# unspecified or 0 --> will be set to 1
# < 0 --> will be set to the actual number of physical cores
# > 0 <= 255 --> will be set to the specified number
# > 255 --> will be set to 255
default_vcpus = -1
# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
# This limitation could be a bug in qemu or in the kernel
# Default number of bridges per POD/VM:
# unspecified or 0 --> will be set to 1
# > 1 <= 5 --> will be set to the specified number
# > 5 --> will be set to 5
default_bridges = 1
# Default memory size in MiB for POD/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048
disable_block_device_use = false
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true
# Debug changes the default hypervisor and kernel parameters to
# enable debug output where available.
# Default false
# these logs can be obtained in the cc-proxy logs when the
# proxy is set to run in debug mode
# /usr/libexec/clear-containers/cc-proxy -log debug
# or by stopping the cc-proxy service and running the cc-proxy
# explicitly using the same command line
#
enable_debug = true
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
[proxy.cc]
path = "/usr/libexec/clear-containers/cc-proxy"
# If enabled, proxy messages will be sent to the system log
# (default: disabled)
enable_debug = true
[shim.cc]
path = "/usr/libexec/clear-containers/cc-shim"
# If enabled, shim messages will be sent to the system log
# (default: disabled)
enable_debug = true
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
enable_debug = true
Output of "cat "/usr/share/defaults/clear-containers/configuration.toml"
":
# XXX: Warning: this file is auto-generated from file "config/configuration.toml.in".
[hypervisor.qemu]
#path = "/usr/bin/qemu-lite-system-x86_64"
#path = "/home/eernst/go/src/github.com/clearcontainers/qemu/bin/debug/native/x86_64-softmmu/qemu-system-x86_64"
path = "/home/eernst/go/src/github.com/clearcontainers/qemu/bin/debug/native-2.9/x86_64-softmmu/qemu-system-x86_64"
#kernel = "/usr/share/clear-containers/vmlinuz.container"
kernel = "/usr/share/clear-containers/osbuilder-vmlinuz"
image = "/usr/share/clear-containers/clear-containers.img"
#image = "/home/eernst/go/src/github.com/clearcontainers/agent/clear-19350-containers.img"
machine_type = "pc"
# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc
kernel_params = ""
# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""
# Default number of vCPUs per POD/VM:
# unspecified or 0 --> will be set to 1
# < 0 --> will be set to the actual number of physical cores
# > 0 <= 255 --> will be set to the specified number
# > 255 --> will be set to 255
default_vcpus = -1
# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
# This limitation could be a bug in qemu or in the kernel
# Default number of bridges per POD/VM:
# unspecified or 0 --> will be set to 1
# > 1 <= 5 --> will be set to the specified number
# > 5 --> will be set to 5
default_bridges = 1
# Default memory size in MiB for POD/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048
disable_block_device_use = false
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true
# Debug changes the default hypervisor and kernel parameters to
# enable debug output where available.
# Default false
# these logs can be obtained in the cc-proxy logs when the
# proxy is set to run in debug mode
# /usr/libexec/clear-containers/cc-proxy -log debug
# or by stopping the cc-proxy service and running the cc-proxy
# explicitly using the same command line
#
enable_debug = true
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
[proxy.cc]
path = "/usr/libexec/clear-containers/cc-proxy"
# If enabled, proxy messages will be sent to the system log
# (default: disabled)
enable_debug = true
[shim.cc]
path = "/usr/libexec/clear-containers/cc-shim"
# If enabled, shim messages will be sent to the system log
# (default: disabled)
enable_debug = true
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
enable_debug = true
Recent runtime problems found in system journal:
time="2018-01-10T09:16:19-08:00" level=info msg="launching qemu with: [-name pod-941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2 -uuid dc46372a-5c56-4b01-8737-235dd51cf346 -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/virtcontainers/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2/dc46372a-5c56-4b0,server,nowait -qmp unix:/run/virtcontainers/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2/dc46372a-5c56-4b0,server,nowait -m 2048M,slots=2,maxmem=97635M -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/virtcontainers/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-19790-containers.img,size=235929600 -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on -device virtserialport,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch0,path=/run/virtcontainers/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2/hyper.sock,server,nowait -device virtserialport,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -chardev socket,id=charch1,path=/run/virtcontainers/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2/tty.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-hyperShared,mount_tag=hyperShared -fsdev local,id=extra-9p-hyperShared,path=/run/hyper/shared/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /usr/share/clear-containers/vmlinuz-4.9.60-82.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 debug systemd.show_status=true systemd.log_level=debug init=/usr/lib/systemd/systemd systemd.unit=clear-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket ip=::::::941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2::off:: -smp 32,cores=32,threads=1,sockets=1]" source=virtcontainers subsystem=qmp
time="2018-01-10T09:16:20-08:00" level=warning msg="Could not umount" error="<nil>" host-path=/run/hyper/shared/pods/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2/941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2-31ba72c61176a37d-resolv.conf source=virtcontainers subsystem=mount
time="2018-01-10T09:16:20-08:00" level=error msg="Container not running, impossible to signal the container" source=runtime
time="2018-01-10T09:16:20-08:00" level=error msg="Container ID (941b35fcd59514290dcc772d5c5f1f4d89e9818480113ca93ddd06eeaedaf4e2) does not exist" source=runtime
time="2018-01-10T09:16:24-08:00" level=info msg="launching qemu with: [-name pod-9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d -uuid 2a8db3ba-9ba7-4d4f-bf5c-7af66fa64adf -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/virtcontainers/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d/2a8db3ba-9ba7-4d4,server,nowait -qmp unix:/run/virtcontainers/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d/2a8db3ba-9ba7-4d4,server,nowait -m 2048M,slots=2,maxmem=97635M -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/virtcontainers/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-19790-containers.img,size=235929600 -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on -device virtserialport,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch0,path=/run/virtcontainers/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d/hyper.sock,server,nowait -device virtserialport,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -chardev socket,id=charch1,path=/run/virtcontainers/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d/tty.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-hyperShared,mount_tag=hyperShared -fsdev local,id=extra-9p-hyperShared,path=/run/hyper/shared/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /usr/share/clear-containers/vmlinuz-4.9.60-82.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 debug systemd.show_status=true systemd.log_level=debug init=/usr/lib/systemd/systemd systemd.unit=clear-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket ip=::::::9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d::off:: -smp 32,cores=32,threads=1,sockets=1]" source=virtcontainers subsystem=qmp
time="2018-01-10T09:16:25-08:00" level=warning msg="Could not umount" error="<nil>" host-path=/run/hyper/shared/pods/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d/9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d-7fb5288ac9103574-resolv.conf source=virtcontainers subsystem=mount
time="2018-01-10T09:16:25-08:00" level=error msg="Container not running, impossible to signal the container" source=runtime
time="2018-01-10T09:16:25-08:00" level=error msg="Container ID (9c6ece92916d1b795d86971d91e2d9ffd646d1ad062ec1571a032b5d63380e4d) does not exist" source=runtime
time="2018-01-10T09:16:28-08:00" level=info msg="launching qemu with: [-name pod-b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50 -uuid b50e2b33-a7b4-4d38-9c49-a6f597f7c919 -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/virtcontainers/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50/b50e2b33-a7b4-4d3,server,nowait -qmp unix:/run/virtcontainers/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50/b50e2b33-a7b4-4d3,server,nowait -m 2048M,slots=2,maxmem=97635M -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/virtcontainers/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-19790-containers.img,size=235929600 -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on -device virtserialport,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch0,path=/run/virtcontainers/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50/hyper.sock,server,nowait -device virtserialport,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -chardev socket,id=charch1,path=/run/virtcontainers/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50/tty.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-hyperShared,mount_tag=hyperShared -fsdev local,id=extra-9p-hyperShared,path=/run/hyper/shared/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /usr/share/clear-containers/vmlinuz-4.9.60-82.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 debug systemd.show_status=true systemd.log_level=debug init=/usr/lib/systemd/systemd systemd.unit=clear-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket ip=::::::b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50::off:: -smp 32,cores=32,threads=1,sockets=1]" source=virtcontainers subsystem=qmp
time="2018-01-10T09:16:29-08:00" level=warning msg="Could not umount" error="<nil>" host-path=/run/hyper/shared/pods/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50/b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50-d18a2b0bd3bdd2bb-resolv.conf source=virtcontainers subsystem=mount
time="2018-01-10T09:16:29-08:00" level=error msg="Container not running, impossible to signal the container" source=runtime
time="2018-01-10T09:16:29-08:00" level=error msg="Container ID (b09b5da1780d2bad1b4b1ca9ec034b5c838ffb4e3e8b50ab560f82ff25f64a50) does not exist" source=runtime
time="2018-01-10T09:16:32-08:00" level=info msg="launching qemu with: [-name pod-8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e -uuid 63ba32a6-5d3c-41ed-9541-f89acab8dc0e -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/63ba32a6-5d3c-41e,server,nowait -qmp unix:/run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/63ba32a6-5d3c-41e,server,nowait -m 2048M,slots=2,maxmem=97635M -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-19790-containers.img,size=235929600 -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on -device virtserialport,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch0,path=/run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/hyper.sock,server,nowait -device virtserialport,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -chardev socket,id=charch1,path=/run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/tty.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-hyperShared,mount_tag=hyperShared -fsdev local,id=extra-9p-hyperShared,path=/run/hyper/shared/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /usr/share/clear-containers/vmlinuz-4.9.60-82.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 debug systemd.show_status=true systemd.log_level=debug init=/usr/lib/systemd/systemd systemd.unit=clear-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket ip=::::::8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e::off:: -smp 32,cores=32,threads=1,sockets=1]" source=virtcontainers subsystem=qmp
time="2018-01-10T09:38:58-08:00" level=error msg="No multicast available for CTL channel" source=runtime
time="2018-01-10T09:38:58-08:00" level=error msg="No multicast available for CTL channel" source=runtime
No recent proxy problems found in system journal.
Recent shim problems found in system journal:
time="2018-01-10T09:16:20.016506094-0800" level="debug" pid=22153 function="send_proxy_message" line=300 source="shim" name="cc-shim" msg="Sending frame of type:0, opcode:5, payload:(empty)"
time="2018-01-10T09:16:24.905300340-0800" level="debug" pid=22423 function="send_proxy_message" line=300 source="shim" name="cc-shim" msg="Sending frame of type:0, opcode:5, payload:(empty)"
time="2018-01-10T09:16:29.281754608-0800" level="debug" pid=22694 function="send_proxy_message" line=300 source="shim" name="cc-shim" msg="Sending frame of type:0, opcode:5, payload:(empty)"
time="2018-01-10T09:38:50.871602250-0800" level="warning" pid=22969 function="reconnect_to_proxy" line=1020 source="shim" name="cc-shim" msg="Reconnecting to cc-proxy (timeout 10 s)"
time="2018-01-10T09:38:50.972527238-0800" level="error" pid=22969 function="handle_proxy_response" line=616 source="shim" name="cc-shim" msg="Error response received from proxy at /run/virtcontainers/pods/8835d65dafbd37f29040133af92a2c056ff7429e42f884f9f06677041619e71e/proxy.sock: {\"msg\":\"vm: unknown token y3SBtpsgVuGWp-Fw6oBFUGyVy3Ybqz8JLFxXGVJCdX8=\"}"
Have docker
Output of "docker info
":
Containers: 106
Running: 0
Paused: 0
Stopped: 106
Images: 2
Server Version: 17.09.0-ce
Storage Driver: devicemapper
Pool Name: docker-8:1-18612488-pool
Pool Blocksize: 65.54kB
Base Device Size: 10.74GB
Backing Filesystem: xfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 592MB
Data Space Total: 107.4GB
Data Space Available: 106.8GB
Metadata Space Used: 2.503MB
Metadata Space Total: 2.147GB
Metadata Space Available: 2.145GB
Thin Pool Minimum Free Space: 10.74GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.110 (2015-10-30)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: cc-runtime mine runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.8.0-36-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 94.35GiB
Name: eernstworkstation
ID: AMN3:EAHW:OBI4:6JX4:B2BI:I3IE:EL75:C3NC:PYTH:IEGB:33VZ:AUBG
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 19
Goroutines: 27
System Time: 2018-01-10T09:39:46.739064797-08:00
EventsListeners: 0
Http Proxy: http://proxy-chain.intel.com:911/
Https Proxy: http://proxy-chain.intel.com:911/
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
WARNING: No swap limit support
Have kubectl
Output of "kubectl config view
":
apiVersion: v1
clusters: []
contexts: []
current-context: ""
kind: Config
preferences: {}
users: []
Have dpkg
Output of "dpkg -l|egrep "(cc-proxy|cc-runtime|cc-shim|clear-containers-image|linux-container|qemu-lite|qemu-system-x86|cc-oci-runtime)"
":
ii cc-proxy 3.0.10+git.513b073-15 amd64
ii cc-runtime 3.0.10+git.3d402d1-15 amd64
ii cc-runtime-bin 3.0.10+git.3d402d1-15 amd64
ii cc-runtime-config 3.0.10+git.3d402d1-15 amd64
ii cc-shim 3.0.10+git.0952966-15 amd64
ii clear-containers-image 19790-43 amd64 Clear containers image
ii linux-container 4.9.60-82 amd64 linux kernel optimised for container-like workloads.
ii qemu-lite 2.7.1+git.d4a337fe91-9 amd64 linux kernel optimised for container-like workloads.
ii qemu-system-x86 1:2.5+dfsg-5ubuntu10.16 amd64 QEMU full system emulation binaries (x86)
Have rpm
Output of "rpm -qa|egrep "(cc-proxy|cc-runtime|cc-shim|clear-containers-image|linux-container|qemu-lite|qemu-system-x86|cc-oci-runtime)"
":
Yes, good call @grahamwhaley!
@egernst - one other thing - can you get a ps
of what is running when docker
(?) hangs.
@jodh-intel 👍 UID PID PPID C STIME TTY TIME CMD root 2977 1 0 Jan09 ? 00:02:20 /usr/bin/dockerd -D --add-runtime cc-runtime=/usr/bin/cc-runtime --add-runtime mine=/home/eernst/go/src/github.com/clearcontainers/runtime/cc-runtime --default-runtime=runc root 3187 2977 0 Jan09 ? 00:01:20 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc --debug root 24337 5913 0 09:46 pts/11 00:00:00 sudo docker run --runtime=mine busybox sh root 24338 24337 0 09:46 pts/11 00:00:00 docker run --runtime=mine busybox sh root 24403 2 0 09:46 ? 00:00:00 [kdmflush] root 24410 2 0 09:46 ? 00:00:00 [bioset] root 24413 2 0 09:46 ? 00:00:00 [xfs-buf/dm-1] root 24414 2 0 09:46 ? 00:00:00 [xfs-data/dm-1] root 24415 2 0 09:46 ? 00:00:00 [xfs-conv/dm-1] root 24416 2 0 09:46 ? 00:00:00 [xfs-cil/dm-1] root 24417 2 0 09:46 ? 00:00:00 [xfs-reclaim/dm-] root 24418 2 0 09:46 ? 00:00:00 [xfs-log/dm-1] root 24419 2 0 09:46 ? 00:00:00 [xfs-eofblocks/d] root 24420 2 0 09:46 ? 00:00:00 [xfsaild/dm-1] root 24450 3187 0 09:46 ? 00:00:00 docker-containerd-shim 859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f /var/run/docker/libcontainerd/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f /home/eernst/go/src/github.com/clearcontainers/runtime/cc-runtime root 24497 24450 8 09:46 ? 00:00:04 /usr/bin/qemu-cc-system-x86_64 -name pod-859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f -uuid a85ea297-2f2d-43a9-97e5-f36c9cc24c8a -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/a85ea297-2f2d-43a,server,nowait -qmp unix:/run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/a85ea297-2f2d-43a,server,nowait -m 2048M,slots=2,maxmem=97635M -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-19790-containers.img,size=235929600 -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on -device virtserialport,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch0,path=/run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/hyper.sock,server,nowait -device virtserialport,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -chardev socket,id=charch1,path=/run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/tty.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-hyperShared,mount_tag=hyperShared -fsdev local,id=extra-9p-hyperShared,path=/run/hyper/shared/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /usr/share/clear-containers/vmlinuz-4.9.60-82.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 debug systemd.show_status=true systemd.log_level=debug init=/usr/lib/systemd/systemd systemd.unit=clear-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket ip=::::::859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f::off:: -smp 32,cores=32,threads=1,sockets=1 root 24499 2 0 09:46 ? 00:00:00 [vhost-24497] root 24500 2 0 09:46 ? 00:00:00 [vhost-24497] root 24501 2 0 09:46 ? 00:00:00 [vhost-24497] root 24502 2 0 09:46 ? 00:00:00 [vhost-24497] root 24503 2 0 09:46 ? 00:00:00 [vhost-24497] root 24504 2 0 09:46 ? 00:00:00 [vhost-24497] root 24505 2 0 09:46 ? 00:00:00 [vhost-24497] root 24506 2 0 09:46 ? 00:00:00 [vhost-24497] root 24539 2 0 09:46 ? 00:00:00 [kvm-pit/24497] root 24540 24450 0 09:46 ? 00:00:00 /usr/libexec/clear-containers/cc-proxy -uri unix:///run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/proxy.sock -log debug root 24545 24450 0 09:46 ? 00:00:00 /usr/libexec/clear-containers/cc-shim -c 859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f -t cEKcTOMZfCMRkenhaSOwWapca2aHGhwXbRhcRbjui60= -u unix:///run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/proxy.sock -d root 24546 24545 0 09:46 ? 00:00:00 /usr/libexec/clear-containers/cc-shim -c 859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f -t cEKcTOMZfCMRkenhaSOwWapca2aHGhwXbRhcRbjui60= -u unix:///run/virtcontainers/pods/859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f/proxy.sock -d root 24547 3187 0 09:46 ? 00:00:00 /home/eernst/go/src/github.com/clearcontainers/runtime/cc-runtime start 859c68a3d767f2296b9c8b4e8fea8699a15d1518ad17432ddcc7fdcf1996155f eernst 24567 5896 0 09:46 pts/10 00:00:00 ps -aef ~
That tells us it's blocked calling vc.StartContainer()
fwics, but it's unclear whereabouts in that function it's blocking. Could you add some virtlog.Info("...")
debug calls to ./vendor/github.com/containers/virtcontainers/api.go
in the runtime, rebuild and install it and re-run to see what that gives?
@jodh-intel : correct, it is hanging during StartContainer. More specifically,
runtime is waiting when sending proxy command:
{newcontainer {f36c8b2c825eecaabce7688dc41
4d7c1f3ef7c1c1e42427c9bc746f80d3db479 rootfs xfs vda [] [0xc420286f90 0xc420286fc0 0xc420286ff0] map[] 0xc42027af20 false {false 0}} qwefGyKiC8zubVXxEY7_M56VBclLdFPIby7m7ec3ri8=}
The agent, in turn, is hung at the mount syscall from newContainerCb -> mountContainerRootFs-> mount, @ https://github.com/clearcontainers/agent/blob/master/syscall.go#L63. It is called with the following parameters: :
source /dev/vda, dest /tmp/hyper/00c5bd0b5f507cdd4db1c8847e9ddbb5eae131f4c1b0d26aed9b74414079def2/root, type: xfs, flag: 0%!(EXTRA string=nouuid)
The parameters look the same (accept the unqiue container ID) when comparing passing and failing cases.
FYI, testing with qemu-lite, I don't see any failures after 100 iterations: for c in {1..100}; do sudo docker run --runtime=mine alpine sh; echo $c; done
dockerd
running in debug mode and if so has it logged anything warnings/errors in the journal?Any thoughts @sboeuf? Once we've worked out what's going on here, it does sound like we should change the agent to not block indefinitely.
Hey @jodh-intel
If I switch to an alternative (AUFS), I still see hangs - see later comment for details, as the signature appears slightly different (I stop seeing agent output at the time the PCI rescan is issued -- for device mapper it would happen later when trying to carry out the actual mount).
When using device mapper, I see nothing particularly useful in the guest kernel; I see the device got hot plugged, but I do not see the XFS mount success message that I would otherwise. Here is the output of the mount command at the time of failure (from the guest):
root@clrcont /tmp # mount
/dev/pmem0p1 on / type ext4 (rw,relatime,dax,errors=remount-ro,stripe=256,data=ordered)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=6486)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev)
tmpfs on /dev type tmpfs (ro,nosuid,noexec,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
mqueue on /dev/mqueue type mqueue (rw,relatime)
devtmpfs on /dev/hugepages type devtmpfs (rw,relatime,size=1023692k,nr_inodes=255923,mode=755)
hyperShared on /tmp/shareDir type 9p (rw,nodev,relatime,sync,dirsync,trans=virtio)
Docker is running in debug mode -- nothing useful here in journal
I have console working with the agent, which is how I was able to checkout dmesg. I do see the device via lsblk (that is, vda), and I see it in sysfs. The device node, /dev/vda, isn't present, presumably because it was moved into the mount namespace associated with cc-agent. I see it there when doing an nsenter (with mount provided in /proc/
root@clrcont /proc/190/ns # ps -ae | grep cc-agent
190 ? 00:00:01 cc-agent
root@clrcont /proc/190/ns # nsenter --mount=mnt
root@clrcont / # stat /dev/vda
File: /dev/vda
Size: 0 Blocks: 0 IO Block: 4096 block special file
Device: 6h/6d Inode: 4260 Links: 1 Device type: fe,0
Access: (0600/brw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2018-01-12 22:28:19.215776443 +0000
Modify: 2018-01-12 22:28:19.212776443 +0000
Change: 2018-01-12 22:28:19.212776443 +0000
Birth: -
Naively, I tried to execute the mount command I expect was intended inside the agent, but receive permissions issue:
root@clrcont / # mount -t xfs /dev/vda /tmp/hyper/3bf6a330d8e85133beb970d3878a82be345c718af2e1b122dfe0e9cf9aee4ca4/root
mount: /tmp/hyper/3bf6a330d8e85133beb970d3878a82be345c718af2e1b122dfe0e9cf9aee4ca4/root: permission denied.
Created a timeout of ~8 seconds for the mount command, after which we check to make sure /dev/vda and the target destination are still stat-able, and then return an error.
See: https://github.com/egernst/agent/blob/qemu-cc-debug/syscall.go#L67
As a next step will try to use alternatives to virtio-blk to mount the device, since we are seeing a different failure signature when using 9p...
Noticing that in the failure scenario udev is struggling with the particular device:
Jan 19 18:23:57 clrcont systemd[1]: Started D-Bus System Message Bus.
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: [1af4:1001] type 00 class 0x01
0000
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: reg 0x10: [io 0x0000-0x003f]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: reg 0x14: [mem 0x00000000-0x00
000fff]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: reg 0x20: [mem 0x00000000-0x00
003fff 64bit pref]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: BAR 4: assigned [mem 0xfe80000
0-0xfe803fff 64bit pref]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: BAR 1: assigned [mem 0xfe40000
0-0xfe400fff]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: BAR 0: assigned [io 0xc000-0x
c03f]
Jan 19 18:23:57 clrcont kernel: virtio-pci 0000:01:01.0: enabling device (0000 -
> 0003)
Jan 19 18:23:58 clrcont kernel: tsc: Refined TSC clocksource calibration: 2893.0
32 MHz
Jan 19 18:23:58 clrcont kernel: clocksource: tsc: mask: 0xffffffffffffffff max_c
ycles: 0x29b38cd2c56, max_idle_ns: 440795342158 ns
Jan 19 18:24:37 clrcont systemd-timesyncd[174]: Timed out waiting for reply from
172.17.0.1:123 (gateway.).
Jan 19 18:24:58 clrcont systemd-udevd[158]: seq 1145 '/devices/pci0000:0
0/0000:00:03.0/0000:01:01.0/virtio3/block/vda' is taking a long time
Jan 19 18:25:20 clrcont systemd-timesyncd[174]: Timed out waiting for reply from
172.17.0.1:123 (gateway.).
Jan 19 18:26:27 clrcont kernel: random: fast init done
Jan 19 18:26:34 clrcont systemd-timesyncd[174]: Timed out waiting for reply from
172.17.0.1:123 (gateway.).
root@clrcont / #
root@clrcont / #
root@clrcont / #
root@clrcont / #
root@clrcont / #
root@clrcont / # {"level":"info","msg":"Received udev event","name":"cc-agent","pid":199,"subsystem":"udevlistener","time":"2018-01-19T18:26:58.397072561Z","udev-event":"add","udev-path":"/sys/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda"}
As seen above, the udev event is received much later. After another couple of minutes, it appears udev "gives up":
Jan 19 18:23:57 clrcont systemd[1]: Started D-Bus System Message Bus.
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: [1af4:1001] type 00 class 0x01
0000
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: reg 0x10: [io 0x0000-0x003f]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: reg 0x14: [mem 0x00000000-0x00
000fff]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: reg 0x20: [mem 0x00000000-0x00
003fff 64bit pref]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: BAR 4: assigned [mem 0xfe80000
0-0xfe803fff 64bit pref]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: BAR 1: assigned [mem 0xfe40000
0-0xfe400fff]
Jan 19 18:23:57 clrcont kernel: pci 0000:01:01.0: BAR 0: assigned [io 0xc000-0x
c03f]
Jan 19 18:23:57 clrcont kernel: virtio-pci 0000:01:01.0: enabling device (0000 -
> 0003)
Jan 19 18:23:58 clrcont kernel: tsc: Refined TSC clocksource calibration: 2893.0
32 MHz
Jan 19 18:23:58 clrcont kernel: clocksource: tsc: mask: 0xffffffffffffffff max_c
ycles: 0x29b38cd2c56, max_idle_ns: 440795342158 ns
Jan 19 18:24:37 clrcont systemd-timesyncd[174]: Timed out waiting for reply from
172.17.0.1:123 (gateway.).
Jan 19 18:24:58 clrcont systemd-udevd[158]: seq 1145 '/devices/pci0000:0
0/0000:00:03.0/0000:01:01.0/virtio3/block/vda' is taking a long time
Jan 19 18:25:20 clrcont systemd-timesyncd[174]: Timed out waiting for reply from
172.17.0.1:123 (gateway.).
Jan 19 18:26:27 clrcont kernel: random: fast init done
Jan 19 18:26:34 clrcont systemd-timesyncd[174]: Timed out waiting for reply from
172.17.0.1:123 (gateway.).
Jan 19 18:26:58 clrcont systemd-udevd[158]: seq 1145 '/devices/pci0000:0
0/0000:00:03.0/0000:01:01.0/virtio3/block/vda' killed
Jan 19 18:26:58 clrcont systemd-udevd[158]: worker [225] terminated by s
ignal 9 (KILL)
Jan 19 18:26:58 clrcont systemd-udevd[158]: worker [225] failed while ha
ndling '/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda'
All of the udevd output from the failure:
# journalctl | grep udev
Jan 19 18:38:40 clrcont systemd-udevd[158]: Network interface NamePolicy= disabled on kernel command line, ignoring.
Jan 19 18:38:40 clrcont systemd[1]: Started udev Coldplug all Devices.
Jan 19 18:38:40 clrcont systemd-udevd[178]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jan 19 18:38:40 clrcont systemd-udevd[186]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jan 19 18:38:40 clrcont systemd-udevd[178]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jan 19 18:39:41 clrcont systemd-udevd[158]: seq 1145 '/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda' is taking a long time
Jan 19 18:41:41 clrcont systemd-udevd[158]: seq 1145 '/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda' killed
Jan 19 18:41:41 clrcont systemd-udevd[158]: worker [225] terminated by signal 9 (KILL)
Jan 19 18:41:41 clrcont systemd-udevd[158]: worker [225] failed while handling '/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda'
Output from udevadm info --export-db after the failure:
--
E: PCI_SLOT_NAME=0000:01:01.0
E: PCI_SUBSYS_ID=1AF4:0002
E: SUBSYSTEM=pci
P: /devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3
E: DEVPATH=/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3
E: DRIVER=virtio_blk
E: MODALIAS=virtio:d00000002v00001AF4
E: SUBSYSTEM=virtio
P: /devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda
N: vda
E: DEVNAME=/dev/vda
E: DEVPATH=/devices/pci0000:00/0000:00:03.0/0000:01:01.0/virtio3/block/vda
E: DEVTYPE=disk
E: MAJOR=254
E: MINOR=0
E: SUBSYSTEM=block
Using strace and ftrace, we see that the mount is called, but not returning (information we already knew from the prints in the log). You can see that via the trace file and the fact that it does not include an entry for sys_exit_mount.
root@clrcont / # cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1 #P:8
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
cc-agent-207 [001] .... 5.418645: sys_mount(dev_name: c42010a180, dir_name: c42015e0c0, type: c42010a18c, flags: 0, data: c42010a098)
root@clrcont / # cat strace* | grep mount
mount("hyperShared", "/tmp/shareDir", "9p", MS_MGC_VAL|MS_NODEV, "trans=virtio") = 0
mount("/dev/vda", "/tmp/hyper/435835da0d249ad87c4e01683848962d1d70b63bf9c1f0fd2a365f906c7cb6f2/root", "xfs", 0, "nouuid"set_robust_list(0x7fac437fe9e0, 24) = 0
ftrace was enabled in the kernel for syscalls (specifically mount), and strace was enabled in the clear containers service by copying strace to the container image and modifying the ExecStart to be: /usr/bin/strace -o strace.log -vv -ff /usr/bin/cc-agent
Maybe this is relevant: https://github.com/systemd/systemd/issues/3446
An additional data point: when moving from virtio-blk to virtio-scsi, this issue no longer manifests
@egernst that's a good news at least. Don't you think we should not spend more time on this issue since virtio-scsi will cover all cases handled by virtio-blk ?
@sboeuf -- no this still concerns me, especially if there are performance tradeoffs between -SCSI and -blk. I'd rather know the exact issue.
Perhaps the severity of the issue is slightly reduced, though...
@egernst ok makes sense. Do you know if there are actual performances trade-offs, or is it a potential outcome ?
This will likely be an issue with Kata runtime -- will test and open new issue there.
@rarindam are you still investigating on this issue ? I'd like to make sure we have a clear description of the issue on the qemu repository so that Qemu folks can really look into this. I don't think clearcontainers repo is a good place to continue discussing about this. Please let me know how we can make this more visible in order to get more help and try to fix it ;)
Hi Sebastien, I am not actively debugging this issue. We can have it in our qemu JIRA to track this as a bug. I will look into it agian tomorrow and update this issue. Arindam
On Feb 20, 2018, at 5:40 PM, Sebastien Boeuf notifications@github.com<mailto:notifications@github.com> wrote:
@rarindamhttps://github.com/rarindam are you still investigating on this issue ? I'd like to make sure we have a clear description of the issue on the qemu repository so that Qemu folks can really look into this. I don't think clearcontainers repo is a good place to continue discussing about this. Please let me know how we can make this more visible in order to get more help and try to fix it ;)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/clearcontainers/runtime/issues/910#issuecomment-367184765, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AbyHQwNIampvqt_WLE55X1ImWnLhZMMyks5tW3P7gaJpZM4RYWlp.
Thanks @rarindam, I think your Qemu JIRA is fine if this is the way you're used to track your issues. I think it's important to get this issue properly described and tracked as we expect (Clear/Kata Containers teams) to get a fix eventually.
Hi @egernst , I tried reproducing the issue today, on Qemu 2.10, cc-runtime 3.0.18. I could see the issue happening. But there are other issues also I see in my setup. So instead of trying to reproduce the issue in a container environment, I am going to do it in a simpler host/guest environment. Just wanted to confirm from you whether the following seems same your scenario:
Being tracked internally in VIRTUALIZB-153
@rarindam Thanks for tracking this internally. Is that possible to get access to this issue ?
@rarindam -- 1 and 3, yes. For 2: it is actually the rootfs for a container which is mounted using virtio-blk (passed in via QMP).
Hi @rarindam, just a reminder you never gave me access to the internal JIRA on this issue so that I can follow the progress ;)
@egernst Please check the kernel stack trace of the thread stuck in mount(2):
# cat /proc/$AGENT_PID/stack
where $AGENT_PID is the thread in question.
@stefanha
root@clrcont / # ps -ae | grep agent
153 ? 00:00:00 cc-agent
root@clrcont / # cat /proc/153/stack
[
@egernst That looks like the agent's main thread, not the thread that is stuck in mount(2). Use ps -eL -o lwp,stat,wchan,args
to list all threads. The status of the thread in mount is probably starts with "S". Its /proc/$PID/stack output should contain SyS_mount.
If udevd hasn't timed out yet, it would also be interesting to see where it is stuck. You can use the same approach with /proc/$PID/stack. Systemd udevd forks a worker process to either execute programs or run built-in actions (see https://github.com/systemd/systemd/blob/master/src/udev/udevd.c#L328).
Hi @sboeuf , are you able to access this link? -link- You need to have access to this JIRA repo....
@rarindam I have removed the link since we're not supposed to share internal links on external repos. Please send me an email with the details including the link and the way to apply to get access to it. Thanks
I tried to run git bisect inbetween qemu tags 2.7.0 and 2.9.0. I hit a snag when using 2.8.0 and above versions...it seems there were some changes to QMP, and the command for adding the block dev was failing in these versions... Continuing to debug the same.
Thanks for the update @rarindam
@rarindam There was a change introduced in qemu 2.9 for the blockdev-add command. I dont know if you are hitting that :
qemu 2.9 and above:
{"execute":"blockdev-add","arguments":{"driver":"raw","node-name":"disk1","file":{"driver":"file","filename":"/dev/mapper/docker-8:2-3016900"}}}
qemu < 2.9:
{"execute":"blockdev-add","arguments":{"options": {"driver":"raw","id":"disk1","file":{"driver":"file","filename":"/dev/mapper/docker-8:2-3016900"}}}}
This issue cannot be reproduced with the current version of cc-runtime(3.23). I am able to reproduce this with an older version - 3.10. With the latest versions, we have introduced some gap between the time the drive is attached and mounted, this is likely the reason why we are not hitting the issue with mount system call not returning. Note the issue still exists in qemu, we are just not seeing in it the latest cc-runtime versions.
@amshinde So this issue wont be seen in earlier versions of qemu, if you introduced the time gap there too?
Description of problem
When using QEMU >= v2.9, I see cc-runtime hang when trying to start a container. This is usually observed when launching and exiting a couple of containers in a row. Observed with both qemu-cc, a local build of v2.9 and a local build of v2.10
On a clean/freshly booted system, I see it fail after 6th time running
sudo docker run alpine sh
At the point of the hang, cc-runtime is executing, as well as the resulting QEMU instance. Relevant logs from journal included below. Looks like it is hanging when runtime is handling hotplug/QMP.
cc-env details;
./cc-runtime cc-env [Meta] Version = "1.0.6"
[Runtime] Debug = false [Runtime.Version] Semver = "3.0.12" Commit = "a33e6bf56f7bfbb0497c08f4fd2f70dc06c9f816" OCI = "1.0.1" [Runtime.Config] Path = "/etc/clear-containers/configuration.toml"
[Hypervisor] MachineType = "pc" Version = "QEMU emulator version 2.9.0\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers" Path = "/usr/bin/qemu-cc-system-x86_64" Debug = false
[Image] Path = "/usr/share/clear-containers/clear-19790-containers.img"
[Kernel] Path = "/usr/share/clear-containers/vmlinuz-4.9.60-82.container" Parameters = ""
[Proxy] Type = "ccProxy" Version = "Version: 3.0.12-bb85f104f886239198ece88552eb604c05e2c6dd" Path = "/usr/libexec/clear-containers/cc-proxy" Debug = true
[Shim] Type = "ccShim" Version = "shim version: 3.0.12 (commit: d01f9a71056fc005a98fbaeba733c7b3b07874f6)" Path = "/usr/libexec/clear-containers/cc-shim" Debug = true
[Agent] Type = "hyperstart" Version = "<>"
[Host] Kernel = "4.8.0-36-generic" CCCapable = true [Host.Distro] Name = "Ubuntu" Version = "16.04" [Host.CPU] Vendor = "GenuineIntel" Model = "Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz"