canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 926 forks source link

Starting an openrc container on an openrc instance fails when trying to mount /sys/fs/cgroup/systemd inside the guest #11242

Closed epsilon-0 closed 1 year ago

epsilon-0 commented 1 year ago

Required information

config:
  images.auto_update_interval: "0"
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- vsock_api
- storage_volumes_all_projects
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIICDTCCAZKgAwIBAgIQEwqxCz2j39ZNvcpGL0dY7jAKBggqhkjOPQQDAzA3MRww
    GgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRcwFQYDVQQDDA5yb290QGxhcHB5
    Tk9PQjAeFw0yMjA3MjAwNjUxMzVaFw0zMjA3MTcwNjUxMzVaMDcxHDAaBgNVBAoT
    E2xpbnV4Y29udGFpbmVycy5vcmcxFzAVBgNVBAMMDnJvb3RAbGFwcHlOT09CMHYw
    EAYHKoZIzj0CAQYFK4EEACIDYgAEC+eaWwNjqI1iRQDXqtiwffNbo5HqnqIBdmA0
    5tdx7JB3amapmettsnlwpvowmpifI6eiEuZlw0FyGt+X0Vx6l2n+q3w9hN2nS8lD
    wUOXUEFuZ/1i+/whUkYX7wXtgZndo2MwYTAOBgNVHQ8BAf8EBAMCBaAwEwYDVR0l
    BAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAsBgNVHREEJTAjgglsYXBweU5P
    T0KHBH8AAAGHEAAAAAAAAAAAAAAAAAAAAAEwCgYIKoZIzj0EAwMDaQAwZgIxAO5e
    TNHU8bYqhRxWlALiDVNGNENU4BvimPQRl+PsC2BxcYwol4IZLP4G6c0kZm6tkwIx
    AIAoOvPGeSaIsULLOB65YwTlZDAKbfOs9i9Q7GAKibueU/LHXRjoNkhm+2s0fghy
    rg==
    -----END CERTIFICATE-----
  certificate_fingerprint: 0392e94afba2dfb745781cbb4187c30b8b9c85f78dcda76db50535ccdbb5dd4b
  driver: lxc
  driver_version: 5.0.1
  firewall: xtables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.18.12-gentoo-dist
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Gentoo
  os_version: "2.9"
  project: default
  server: lxd
  server_clustered: false
  server_event_mode: full-mesh
  server_name: lappyNOOB
  server_pid: 4538
  server_version: 5.0.1
  storage: dir
  storage_version: "1"
  storage_supported_drivers:
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.17(2) (2022-11-10) / 1.02.187 (2022-11-10) / 4.46.0
    remote: false

Issue description

Starting an openrc gentoo container on an openrc gentoo machine fails when /sys/fs/cgroup/systemd if mounted.

Steps to reproduce

  1. Step one - Install gentoo and lxd. emerge --verbose app-containers/lxd
  2. Step two - Start an openrc gentoo container lxc start <gentoo-image-hash>.
  3. Step three - See that the starting fails with the error message -
    
    @root@lappyNOOB ~ # lxc info --show-log gentoo
    Name: gentoo
    Status: STOPPED
    Type: container
    Architecture: x86_64
    Created: 2022/07/21 08:58 EDT
    Last Used: 2023/01/01 19:55 EST

Log:

lxc gentoo 20230102005524.144 ERROR cgfsng - ../lxc-5.0.1/src/lxc/cgroups/cgfsng.c:cgfsng_mount:2131 - No such file or directory - Failed to create cgroup at_mnt 24() lxc gentoo 20230102005524.144 ERROR conf - ../lxc-5.0.1/src/lxc/conf.c:lxc_mount_auto_mounts:851 - No such file or directory - Failed to mount "/sys/fs/cgroup" lxc gentoo 20230102005524.144 ERROR conf - ../lxc-5.0.1/src/lxc/conf.c:lxc_setup:4396 - Failed to setup remaining automatic mounts lxc gentoo 20230102005524.144 ERROR start - ../lxc-5.0.1/src/lxc/start.c:do_start:1272 - Failed to setup container "gentoo" lxc gentoo 20230102005524.144 ERROR sync - ../lxc-5.0.1/src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4) lxc gentoo 20230102005524.146 WARN network - ../lxc-5.0.1/src/lxc/network.c:lxc_delete_network_priv:3631 - Failed to rename interface with index 0 from "eth0" to its initial name "vethbf0c8995" lxc gentoo 20230102005524.146 ERROR start - ../lxc-5.0.1/src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "gentoo" lxc gentoo 20230102005524.146 ERROR lxccontainer - ../lxc-5.0.1/src/lxc/lxccontainer.c:wait_on_daemonized_start:877 - Received container state "ABORTING" instead of "RUNNING" lxc gentoo 20230102005524.146 WARN start - ../lxc-5.0.1/src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 19 for process 4887 lxc 20230102005529.305 ERROR af_unix - ../lxc-5.0.1/src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response lxc 20230102005529.305 ERROR commands - ../lxc-5.0.1/src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_state"



# Temporary Workaround
- Unmount `/sys/fs/cgroup/systemd` before starting the openrc container. 

Not an ideal solution as this inhibits the launching of systemd containers, which necessitates remounting it again.
juippis commented 1 year ago

Hmm it should work. Can you show your /etc/rc.conf file?

Then again I may accidentally have tested this with only my own lxd images, need to try with upstream images.

(For anyone wondering this is related, https://github.com/gentoo/gentoo/commit/0fcc7c7e18b0e57e53c5aaf487744ca426dc949c)

epsilon-0 commented 1 year ago

Sure -

/etc/rc.conf:

# Global OpenRC configuration settings

# Set to "YES" if you want the rc system to try and start services
# in parallel for a slight speed improvement. When running in parallel we
# prefix the service output with its name as the output will get
# jumbled up.
# WARNING: whilst we have improved parallel, it can still potentially lock
# the boot process. Don't file bugs about this unless you can supply
# patches that fix it without breaking other things!
#rc_parallel="NO"

# Set rc_interactive to "YES" and you'll be able to press the I key during
# boot so you can choose to start specific services. Set to "NO" to disable
# this feature. This feature is automatically disabled if rc_parallel is
# set to YES.
rc_interactive="NO"

# If we need to drop to a shell, you can specify it here.
# If not specified we use $SHELL, otherwise the one specified in /etc/passwd,
# otherwise /bin/sh
# Linux users could specify /sbin/sulogin
rc_shell=/sbin/sulogin

# Do we allow any started service in the runlevel to satisfy the dependency
# or do we want all of them regardless of state? For example, if net.eth0
# and net.eth1 are in the default runlevel then with rc_depend_strict="NO"
# both will be started, but services that depend on 'net' will work if either
# one comes up. With rc_depend_strict="YES" we would require them both to
# come up.
#rc_depend_strict="YES"

# rc_hotplug controls which services we allow to be hotplugged.
# A hotplugged service is one started by a dynamic dev manager when a matching
# hardware device is found.
# Hotplugged services appear in the "hotplugged" runlevel.
# If rc_hotplug is set to any value, we compare the name of this service
# to every pattern in the value, from left to right, and we allow the
# service to be hotplugged if it matches a pattern, or if it matches no
# patterns. Patterns can include shell wildcards.
# To disable services from being hotplugged, prefix patterns with "!".
#If rc_hotplug is not set or is empty, all hotplugging is disabled.
# Example - rc_hotplug="net.wlan !net.*"
# This allows net.wlan and any service not matching net.* to be hotplugged.
# Example - rc_hotplug="!net.*"
# This allows services that do not match "net.*" to be hotplugged.

# rc_logger launches a logging daemon to log the entire rc process to
# /var/log/rc.log
# NOTE: Linux systems require the devfs service to be started before
# logging can take place and as such cannot log the sysinit runlevel.
rc_logger="YES"

# Through rc_log_path you can specify a custom log file.
# The default value is: /var/log/rc.log
#rc_log_path="/var/log/rc.log"

# If you want verbose output for OpenRC, set this to yes. If you want
# verbose output for service foo only, set it to yes in /etc/conf.d/foo.
#rc_verbose=no

# By default we filter the environment for our running scripts. To allow other
# variables through, add them here. Use a * to allow all variables through.
#rc_env_allow="VAR1 VAR2"

# By default we assume that all daemons will start correctly.
# However, some do not - a classic example is that they fork and return 0 AND
# then child barfs on a configuration error. Or the daemon has a bug and the
# child crashes. You can set the number of milliseconds start-stop-daemon
# waits to check that the daemon is still running after starting here.
# The default is 0 - no checking.
#rc_start_wait=100

# rc_nostop is a list of services which will not stop when changing runlevels.
# This still allows the service itself to be stopped when called directly.
#rc_nostop=""

# rc will attempt to start crashed services by default.
# However, it will not stop them by default as that could bring down other
# critical services.
#rc_crashed_stop=NO
#rc_crashed_start=YES

# Set rc_nocolor to yes if you do not want colors displayed in OpenRC
# output.
#rc_nocolor=NO

##############################################################################
# MISC CONFIGURATION VARIABLES
# There variables are shared between many init scripts

# Set unicode to YES to turn on unicode support for keyboards and screens.
unicode="YES"

# This is how long fuser should wait for a remote server to respond. The
# default is 60 seconds, but  it can be adjusted here.
#rc_fuser_timeout=60

# Below is the default list of network fstypes.
#
# afs ceph cifs coda davfs fuse fuse.sshfs gfs glusterfs lustre ncpfs
# nfs nfs4 ocfs2 shfs smbfs
#
# If you would like to add to this list, you can do so by adding your
# own fstypes to the following variable.
#extra_net_fs_list=""

##############################################################################
# SERVICE CONFIGURATION VARIABLES
# These variables are documented here, but should be configured in
# /etc/conf.d/foo for service foo and NOT enabled here unless you
# really want them to work on a global basis.
# If your service has characters in its name which are not legal in
# shell variable names and you configure the variables for it in this
# file, those characters should be replaced with underscores in the
# variable names as shown below.

# Some daemons are started and stopped via start-stop-daemon.
# We can set some things on a per service basis, like the nicelevel.
#SSD_NICELEVEL="0"
# Or the ionice level. The format is class[:data] , just like the
# --ionice start-stop-daemon parameter.
#SSD_IONICELEVEL="0:0"
# Or the OOM score adjustment.
#SSD_OOM_SCORE_ADJ="0"

# Pass ulimit parameters
# If you are using bash in POSIX mode for your shell, note that the
# ulimit command uses a block size of 512 bytes for the -c and -f
# options
#rc_ulimit="-u 30"

# It's possible to define extra dependencies for services like so
#rc_config="/etc/foo"
#rc_need="openvpn"
#rc_use="net.eth0"
#rc_after="clock"
#rc_before="local"
#rc_provide="!net"

# You can also enable the above commands here for each service. Below is an
# example for service foo.
#rc_foo_config="/etc/foo"
#rc_foo_need="openvpn"
#rc_foo_after="clock"

# Below is an example for service foo-bar. Note that the '-' is illegal
# in a shell variable name, so we convert it to an underscore.
# example for service foo-bar.
#rc_foo_bar_config="/etc/foo-bar"
#rc_foo_bar_need="openvpn"
#rc_foo_bar_after="clock"

# You can also remove dependencies.
# This is mainly used for saying which services do NOT provide net.
#rc_net_tap0_provide="!net"

# This is the subsystem type.
# It is used to match against keywords set by the keyword call in the
# depend function of service scripts.
#
# It should be set to the value representing the environment this file is
# PRESENTLY in, not the virtualization the environment is capable of.
# If it is commented out, automatic detection will be used.
#
# The list below shows all possible settings as well as the host
# operating systems where they can be used and autodetected.
#
# ""               - nothing special
# "docker"         - Docker container manager (Linux)
# "jail"           - Jail (DragonflyBSD or FreeBSD)
# "lxc"            - Linux Containers
# "openvz"         - Linux OpenVZ
# "prefix"         - Prefix
# "rkt"            - CoreOS container management system (Linux)
# "subhurd"        - Hurd subhurds (to be checked)
# "systemd-nspawn" - Container created by systemd-nspawn (Linux)
# "uml"            - Usermode Linux
# "vserver"        - Linux vserver
# "xen0"           - Xen0 Domain (Linux and NetBSD)
# "xenU"           - XenU Domain (Linux and NetBSD)
#rc_sys=""

# if  you use openrc-init, which is currently only available on Linux,
# this is the default runlevel to activate after "sysinit" and "boot"
# when booting.
#rc_default_runlevel="default"

# on Linux and Hurd, this is the number of ttys allocated for logins
# It is used in the consolefont, keymaps, numlock and termencoding
# service scripts.
rc_tty_number=12

##############################################################################
# LINUX CGROUPS RESOURCE MANAGEMENT

# This sets the mode used to mount cgroups.
# "hybrid" mounts cgroups version 2 on /sys/fs/cgroup/unified and
# cgroups version 1 on /sys/fs/cgroup.
# "legacy" mounts cgroups version 1 on /sys/fs/cgroup
# "unified" mounts cgroups version 2 on /sys/fs/cgroup
rc_cgroup_mode="unified"

# This is a list of controllers which should be enabled for cgroups version 2
# when hybrid mode is being used.
# Controllers listed here will not be available for cgroups version 1.
#rc_cgroup_controllers=""

# This variable contains the cgroups version 2 settings for your services.
# If this is set in this file, the settings will apply to all services.
# If you want different settings for each service, place the settings in
# /etc/conf.d/foo for service foo.
# The format is to specify the setting and value followed by a newline.
# Multiple settings and values can be specified.
# For example, you would use this to set the maximum memory and maximum
# number of pids for a service.
#rc_cgroup_settings="
#memory.max 10485760
#pids.max max
#"
#
# For more information about the adjustments that can be made with
# cgroups version 2, see Documentation/cgroups-v2.txt in the linux kernel
# source tree.
#rc_cgroup_settings=""

# This switch controls whether or not cgroups version 1 controllers are
# individually mounted under
# /sys/fs/cgroup in hybrid or legacy mode.
#rc_controller_cgroups="YES"

# The following setting turns on the memory.use_hierarchy setting in the
# root memory cgroup for cgroups v1.
# It must be set to yes in this file if you want this functionality.
#rc_cgroup_memory_use_hierarchy="NO"

# The following settings allow you to set up values for the cgroups version 1
# controllers for your services.
# They can be set in this file;, however, if you do this, the settings
# will apply to all of your services.
# If you want different settings for each service, place the settings in
# /etc/conf.d/foo for service foo.
# The format is to specify the names of the settings followed by their
# values. Each variable can hold multiple settings.
# For example, you would use this to set the cpu.shares setting in the
# cpu controller to 512 for your service.
# rc_cgroup_cpu="
# cpu.shares 512
# "
#
# For more information about the adjustments that can be made with
# cgroups version 1, see Documentation/cgroups-v1/* in the linux kernel
# source tree.

# Set the blkio controller settings for this service.
#rc_cgroup_blkio=""

# Set the cpu controller settings for this service.
#rc_cgroup_cpu=""

# Add this service to the cpuacct controller (any value means yes).
#rc_cgroup_cpuacct=""

# Set the cpuset controller settings for this service.
#rc_cgroup_cpuset=""

# Set the devices controller settings for this service.
#rc_cgroup_devices=""

# Set the hugetlb controller settings for this service.
#rc_cgroup_hugetlb=""

# Set the memory controller settings for this service.
#rc_cgroup_memory=""

# Set the net_cls controller settings for this service.
#rc_cgroup_net_cls=""

# Set the net_prio controller settings for this service.
#rc_cgroup_net_prio=""

# Set the pids controller settings for this service.
#rc_cgroup_pids=""

# Set this to YES if you want all of the processes in a service's cgroup
# killed when the service is stopped or restarted.
# Be aware that setting this to yes means all of a service's
# child processes will be killed. Keep this in mind if you set this to
# yes here instead of for the individual services in
# /etc/conf.d/<service>.
# To perform this cleanup manually for a stopped service, you can
# execute cgroup_cleanup with /etc/init.d/<service> cgroup_cleanup or
# rc-service <service> cgroup_cleanup.
# If the kernel includes support for cgroup2's cgroup.kill, this is used
# to reliably teardown the cgroup.
# If this fails, the process followed in this cleanup is the following:
# 1. send stopsig (sigterm if it isn't set) to all processes left in the
# cgroup immediately followed by sigcont.
# 2. Send sighup to all processes in the cgroup if rc_send_sighup is
# yes.
# 3. delay for rc_timeout_stopsec seconds.
# 4. send sigkill to all processes in the cgroup unless disabled by
# setting rc_send_sigkill to no.
# rc_cgroup_cleanup="NO"

# If this is yes, we will send sighup to the processes in the cgroup
# immediately after stopsig and sigcont.
#rc_send_sighup="NO"

# This is the amount of time in seconds that we delay after sending sigcont
# and optionally sighup, before we optionally send sigkill to all
# processes in the # cgroup.
# The default is 90 seconds.
#rc_timeout_stopsec="90"

# If this is set to no, we do not send sigkill to all processes in the
# cgroup.
#rc_send_sigkill="YES"

brauner commented 1 year ago

Hmm it should work. Can you show your /etc/rc.conf file?

Then again I may accidentally have tested this with only my own lxd images, need to try with upstream images.

(For anyone wondering this is related, gentoo/gentoo@0fcc7c7)

All controllers that are supposed to be mounted by the container need to have been mounted/enabled on the host before. Over the years I've mentioned this quite a few times.

Systemd will first try to mount the unified cgroup hierarchy and use it for process tracking and if it isn't available or mounted it will fallback to the cgroup v1 name= feature where it creates a name=systemd controller it used for process tracking. If neither is available it will fail to boot.

juippis commented 1 year ago

@epsilon-0 if you update your rc.conf to have:

rc_cgroup_mode="hybrid"
rc_cgroup_controllers="yes"

, obviously reboot, and does it work then?

My system can use upstream images:gentoo/openrc images on openrc just fine.

tomponline commented 1 year ago

Can this be closed?

juippis commented 1 year ago

I believe this is a Gentoo-related issue. I'm gonna (when I have some time) check the differences between rc_cgroup_modes in openrc and update the openrc init.d file accordingly.

juippis commented 1 year ago

@epsilon-0 please try with the latest lxd revision in ::gentoo repository tree, https://github.com/gentoo/gentoo/commit/67e0c923557984eda05b09fcad093c0840ef8a12

epsilon-0 commented 1 year ago

the new one is working nicely small point, it doesn't need the message rc_cgroup_controllers="yes", its the default value if it is commented out here - https://github.com/OpenRC/openrc/blob/master/init.d/cgroups.in#L45 - it shows that if it is empty, its taken as YES