canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 926 forks source link

No space left on device during cgroup pivot #6463

Closed CameronNemo closed 4 years ago

CameronNemo commented 4 years ago

Required information

config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIICBDCCAYmgAwIBAgIQUu06mcoB+Kcwo0pjQdD4iTAKBggqhkjOPQQDAzAyMRww
    GgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRIwEAYDVQQDDAlyb290QGNlY2kw
    HhcNMTkxMTE2MjEwMDEyWhcNMjkxMTEzMjEwMDEyWjAyMRwwGgYDVQQKExNsaW51
    eGNvbnRhaW5lcnMub3JnMRIwEAYDVQQDDAlyb290QGNlY2kwdjAQBgcqhkjOPQIB
    BgUrgQQAIgNiAAT4htz8fqIjiDCP4REw9FqNt3WHtkZk21Jtn43tmW/S/6NohTMY
    YjYRFm5T0cunt7jMl3f88aCcSW5VbmhyzIyDCfLXyjwSY3znOD9Md2mOSMa+0eFk
    lyoFPSfdaqF+tTyjZDBiMA4GA1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggrBgEF
    BQcDATAMBgNVHRMBAf8EAjAAMC0GA1UdEQQmMCSCBGNlY2mHBMCoAQWHBAqruQGH
    EP1C7Mfe7HYYAAAAAAAAAAEwCgYIKoZIzj0EAwMDaQAwZgIxAKdArXmTqifWSbwG
    AxIpz6Ci7SIuCSgWFQdN/voIIphX/BNZNWvA5fEnvPx5dXTDAwIxAPIKo3Sb8DrB
    1QO4reFTnbHFgmDkJmcKdcu8ULR/hywfk4AAp0mXTKSX2SEuoau+tw==
    -----END CERTIFICATE-----
  certificate_fingerprint: e9c8c6699ffb2a2a605b0d5936f0d36c3b5d39702c4b913352af666e5ed0acee
  driver: lxc
  driver_version: 3.2.1
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.3.10_1
  lxc_features:
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    seccomp_notify: "true"
  project: default
  server: lxd
  server_clustered: false
  server_name: ceci
  server_pid: 14638
  server_version: "3.18"
  storage: dir
  storage_version: "1"

lxc info --show-log local:chief-gull

Name: chief-gull
Location: none
Remote: unix://
Architecture: x86_64
Created: 2019/11/16 21:02 UTC
Status: Stopped
Type: persistent
Profiles: default

Log:

lxc chief-gull 20191116210225.895 ERROR    cgfsng - cgroups/cgfsng.c:__do_cgroup_enter:1500 - No space left on device - Failed to enter cgroup "/sys/fs/cgroup/cpuset//lxc.monitor/chief-gull/cgroup.procs"
lxc chief-gull 20191116210225.895 ERROR    start - start.c:__lxc_start:2009 - Failed to enter monitor cgroup
lxc chief-gull 20191116210225.895 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:872 - Received container state "STOPPING" instead of "RUNNING"
lxc chief-gull 20191116210225.976 WARN     cgfsng - cgroups/cgfsng.c:cgfsng_monitor_destroy:1180 - No space left on device - Failed to move monitor 15378 to "/sys/fs/cgroup/cpuset//lxc.pivot/cgroup.procs"

lxc 20191116210225.978 WARN     commands - commands.c:lxc_cmd_rsp_recv:134 - Connection reset by peer - Failed to receive response for command "get_state"

Issue description

Launching a container fails.

Steps to reproduce

  1. Compile sqlite-replication, libco, raft, dqlite, lxd as xbps packages from https://github.com/void-linux/void-packages/pull/15045
  2. Install lxd from local xbps repo
  3. Start the daemon:
_systemd_cgrp="/sys/fs/cgroup/systemd"
test -d ${_systemd_cgrp} || mkdir ${_systemd_cgrp}
mountpoint -q ${_systemd_cgrp} || mount -t cgroup -o none,name=systemd cgroup ${_systemd_cgrp}
exec lxd --group lxd
  1. Run lxd init
  2. Run lxc launch images:voidlinux

Tested with multiple images.

CameronNemo commented 4 years ago

I can reproduce this with plain LXC too... it appears that cpuset.cpus is unset for the lxc cgroups.

root@ceci /s/f/c/cpuset# head lxc.*/cpuset.{mems,cpus}
==> lxc.monitor/cpuset.mems <==
0

==> lxc.pivot/cpuset.mems <==
0

==> lxc.monitor/cpuset.cpus <==

==> lxc.pivot/cpuset.cpus <==
stgraber commented 4 years ago

I believe that's a liblxc bug that's been fixed upstream for a few months but isn't yet in any released version of liblxc.

One workaround can be found in the lxd snap packaging here:

https://github.com/lxc/lxd-pkg-snap/blob/latest-edge/snapcraft/commands/daemon.start#L263

CameronNemo commented 4 years ago

Stéphane, do you happen to know the LXC commit(s) fixing this? Is it possible for me to backport it? I would need to tell LXC users to add that snippet to their rc.local otherwise. (I am just adding it to the runit run file for LXD).

stgraber commented 4 years ago

https://github.com/lxc/lxd-pkg-snap/blob/latest-candidate/snapcraft.yaml#L530

That's the list of commits we currently cherry pick for the snap, including the needed cgroups fixes.