clearcontainers / runtime

OCI (Open Containers Initiative) compatible runtime using Virtual Machines
Apache License 2.0
589 stars 70 forks source link

postgresql container fails to start due to shared memory error #1100

Open zeigerpuppy opened 6 years ago

zeigerpuppy commented 6 years ago

Description of problem

postgres uses the kernel shared memory achitecture shm. It looks like clear containers don't properly initialise this value, causing an error like:

docker run -d postgres:9.6-alpine
FATAL:  could not resize shared memory segment "/PostgreSQL.262836907" to 2316 bytes: Not supported

This may be related to https://github.com/clearcontainers/runtime/issues/18

Expected result

Postgres containers should run without modification.

Actual result

When postgres starts it expects to have access to the /dev/shm device. I presume that the default has been set to `--ipc="none". See docker docs.

Fix

I started the container with these options and it worked, seems to confirm that the --ipc option is the cause:

docker run -d `--ipc="private" --shm-size 128m postgres:9.6-alpine

Config

cc-collect-data.sh output (running on Debian 9 without systemd and ZFS 0.7.7 zvol storage as block device for LVM thin provisioning):

Running cc-collect-data.sh version 3.0.23 (commit 64d2226) at 2018-04-24.15:21:51.686666369+1000.


Runtime is /usr/bin/cc-runtime.

cc-env

Output of "/usr/bin/cc-runtime cc-env":

[Meta]
  Version = "1.0.9"

[Runtime]
  Debug = false
  [Runtime.Version]
    Semver = "3.0.23"
    Commit = "64d2226"
    OCI = "1.0.1"
  [Runtime.Config]
    Path = "/usr/share/defaults/clear-containers/configuration.toml"

[Hypervisor]
  MachineType = "pc"
  Version = "QEMU emulator version 2.7.1(2.7.1+git.d4a337fe91-11.cc), Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers"
  Path = "/usr/bin/qemu-lite-system-x86_64"
  Debug = false
  BlockDeviceDriver = "virtio-blk"

[Image]
  Path = "/usr/share/clear-containers/cc-20640-agent-6f6e9e.img"

[Kernel]
  Path = "/usr/share/clear-containers/vmlinuz-4.14.22-86.container"
  Parameters = ""

[Proxy]
  Type = "ccProxy"
  Version = "Version: 3.0.23+git.3cebe5e"
  Path = "/usr/libexec/clear-containers/cc-proxy"
  Debug = false

[Shim]
  Type = "ccShim"
  Version = "shim version: 3.0.23 (commit: 205ecf7)"
:/usr/bin/cc-collect-data.sh: line 242: journalctl: command not found
/usr/bin/cc-collect-data.sh: line 242: journalctl: command not found
/usr/bin/cc-collect-data.sh: line 242: journalctl: command not found
  Path = "/usr/libexec/clear-containers/cc-shim"
  Debug = false

[Agent]
  Type = "hyperstart"
  Version = "<<unknown>>"

[Host]
  Kernel = "4.9.0-6-amd64"
  Architecture = "amd64"
  VMContainerCapable = true
  [Host.Distro]
    Name = "Debian GNU/Linux"
    Version = "9"
  [Host.CPU]
    Vendor = "GenuineIntel"
    Model = "Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz"

Runtime config files

Runtime default config files

/usr/share/defaults/clear-containers/configuration.toml
/usr/share/defaults/clear-containers/configuration.toml

Runtime config file contents

Output of "cat "/etc/clear-containers/configuration.toml"":

# XXX: WARNING: this file is auto-generated.
# XXX:
# XXX: Source file: "config/configuration.toml.in"
# XXX: Project:
# XXX:   Name: Intel® Clear Containers
# XXX:   Type: cc

[hypervisor.qemu]
path = "/usr/bin/qemu-lite-system-x86_64"
kernel = "/usr/share/clear-containers/vmlinuz.container"
image = "/usr/share/clear-containers/clear-containers.img"
machine_type = "pc"

# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc.
#
# WARNING: - any parameter specified here will take priority over the default
# parameter value of the same name used to start the virtual machine.
# Do not set values here unless you understand the impact of doing so as you
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = ""

# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""

# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""

# Default number of vCPUs per POD/VM:
# unspecified or 0                --> will be set to 1
# < 0                             --> will be set to the actual number of physical cores
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores
default_vcpus = 1

# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
#   This limitation could be a bug in qemu or in the kernel
# Default number of bridges per POD/VM:
# unspecified or 0   --> will be set to 1
# > 1 <= 5           --> will be set to the specified number
# > 5                --> will be set to 5
default_bridges = 1

# Default memory size in MiB for POD/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048

# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's 
# root file system is backed by a block device, the block device is passed
# directly to the hypervisor for performance reasons. 
# This flag prevents the block device from being passed to the hypervisor, 
# 9pfs is used instead to pass the rootfs.
disable_block_device_use = false

# Block storage driver to be used for the hypervisor in case the container
# rootfs is backed by a block device. This is either virtio-scsi or 
# virtio-blk.
#lock_device_driver = "virtio-scsi"
block_device_driver = "virtio-blk"

# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true

# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically 
# result in memory pre allocation
#enable_hugepages = true

# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true

# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. This extra output is added
# to the proxy logs, but only when proxy debug is also enabled.
# 
# Default false
#enable_debug = true

# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
# 
#disable_nesting_checks = true

[proxy.cc]
path = "/usr/libexec/clear-containers/cc-proxy"
# If enabled, proxy messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[shim.cc]
path = "/usr/libexec/clear-containers/cc-shim"

# If enabled, shim messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[agent.cc]
# There is no field for this section. The goal is only to be able to
# specify which type of agent the user wants to use.

[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
#
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
# Options:
#
#   - bridged
#     Uses a linux bridge to interconnect the container interface to
#     the VM. Works for most cases except macvlan and ipvlan.
#
#   - macvtap
#     Used when the Container network interface can be bridged using
#     macvtap.
internetworking_model="bridged"

Output of "cat "/usr/share/defaults/clear-containers/configuration.toml"":

# XXX: WARNING: this file is auto-generated.
# XXX:
# XXX: Source file: "config/configuration.toml.in"
# XXX: Project:
# XXX:   Name: Intel® Clear Containers
# XXX:   Type: cc

[hypervisor.qemu]
path = "/usr/bin/qemu-lite-system-x86_64"
kernel = "/usr/share/clear-containers/vmlinuz.container"
image = "/usr/share/clear-containers/clear-containers.img"
machine_type = "pc"

# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc.
#
# WARNING: - any parameter specified here will take priority over the default
# parameter value of the same name used to start the virtual machine.
# Do not set values here unless you understand the impact of doing so as you
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = ""

# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""

# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""

# Default number of vCPUs per POD/VM:
# unspecified or 0                --> will be set to 1
# < 0                             --> will be set to the actual number of physical cores
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores
default_vcpus = 1

# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
#   This limitation could be a bug in qemu or in the kernel
# Default number of bridges per POD/VM:
# unspecified or 0   --> will be set to 1
# > 1 <= 5           --> will be set to the specified number
# > 5                --> will be set to 5
default_bridges = 1

# Default memory size in MiB for POD/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048

# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's 
# root file system is backed by a block device, the block device is passed
# directly to the hypervisor for performance reasons. 
# This flag prevents the block device from being passed to the hypervisor, 
# 9pfs is used instead to pass the rootfs.
disable_block_device_use = false

# Block storage driver to be used for the hypervisor in case the container
# rootfs is backed by a block device. This is either virtio-scsi or 
# virtio-blk.
#lock_device_driver = "virtio-scsi"
block_device_driver = "virtio-blk"

# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true

# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically 
# result in memory pre allocation
#enable_hugepages = true

# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true

# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. This extra output is added
# to the proxy logs, but only when proxy debug is also enabled.
# 
# Default false
#enable_debug = true

# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
# 
#disable_nesting_checks = true

[proxy.cc]
path = "/usr/libexec/clear-containers/cc-proxy"

# If enabled, proxy messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[shim.cc]
path = "/usr/libexec/clear-containers/cc-shim"

# If enabled, shim messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[agent.cc]
# There is no field for this section. The goal is only to be able to
# specify which type of agent the user wants to use.

[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
#
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
# Options:
#
#   - bridged
#     Uses a linux bridge to interconnect the container interface to
#     the VM. Works for most cases except macvlan and ipvlan.
#
#   - macvtap
#     Used when the Container network interface can be bridged using
#     macvtap.
internetworking_model="bridged"

Agent

version:

unknown

Logfiles

Runtime logs

No recent runtime problems found in system journal.

Proxy logs

No recent proxy problems found in system journal.

Shim logs

No recent shim problems found in system journal.


Container manager details

Have docker

Docker

Output of "docker version":

Client:
 Version:       17.12.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    c97c6d6
 Built: Wed Dec 27 20:11:19 2017
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   c97c6d6
  Built:        Wed Dec 27 20:09:54 2017
  OS/Arch:      linux/amd64
  Experimental: false

Output of "docker info":

Containers: 2
 Running: 1
 Paused: 0
 Stopped: 1
Images: 4
Server Version: 17.12.0-ce
Storage Driver: devicemapper
 Pool Name: docker--vg-docker--pool
 Pool Blocksize: 524.3kB
 Base Device Size: 10.74GB
 Backing Filesystem: ext4
 Udev Sync Supported: true
 Data Space Used: 6.871GB
 Data Space Total: 161GB
 Data Space Available: 154.1GB
 Metadata Space Used: 1.667MB
 Metadata Space Total: 109.1MB
 Metadata Space Available: 107.4MB
 Thin Pool Minimum Free Space: 16.1GB
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.137 (2016-11-30)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: cc-runtime runc
Default Runtime: cc-runtime
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: 64d2226 (expected: b2567b37d7b75eb4cf325b77297b140ea686ce8f)
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.0-6-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 40
Total Memory: 376.6GiB
Name: <REDACTED>
ID: <REDACTED>
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 25
 Goroutines: 38
 System Time: 2018-04-24T15:23:39.393648987+10:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Output of "systemctl show docker":

/usr/bin/cc-collect-data.sh: line 167: systemctl: command not found

No kubectl


Packages

Have dpkg Output of "dpkg -l|egrep "(cc-oci-runtime|cc-proxy|cc-runtime|cc-shim|kata-proxy|kata-runtime|kata-shim|clear-containers-image|linux-container|qemu-lite|qemu-system-x86)"":

ii  cc-proxy                             3.0.23+git.3cebe5e-27                   amd64        
ii  cc-runtime                           3.0.23+git.64d2226-27                   amd64        
ii  cc-runtime-bin                       3.0.23+git.64d2226-27                   amd64        
ii  cc-runtime-config                    3.0.23+git.64d2226-27                   amd64        
ii  cc-shim                              3.0.23+git.205ecf7-27                   amd64        
ii  clear-containers-image               20640-48                                amd64        Clear containers image
ii  linux-container                      4.14.22-86                              amd64        linux kernel optimised for container-like workloads.
ii  qemu-lite                            2.7.1+git.d4a337fe91-11                 amd64        linux kernel optimised for container-like workloads.
ii  qemu-system-x86                      1:2.8+dfsg-6+deb9u3                     amd64        QEMU full system emulation binaries (x86)

Have rpm Output of "rpm -qa|egrep "(cc-oci-runtime|cc-proxy|cc-runtime|cc-shim|kata-proxy|kata-runtime|kata-shim|clear-containers-image|linux-container|qemu-lite|qemu-system-x86)"":


grahamwhaley commented 6 years ago

/cc @amshinde @devimc I think. Also, recently related would be https://github.com/kata-containers/kata-containers/issues/21 and its friends.

amshinde commented 6 years ago

@zeigerpuppy Do you see this issue with kata-containers ? See installation instructions here https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md

zeigerpuppy commented 6 years ago

@amshinde, I am a bit reluctant to layer on kata-containers at the moment because my setup is performing quite well. I'm also using this on a production server... so will need to wait until our fallback server is configured before changing config too much! I'll report back once I have a chance to test