canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

can not stop lxc containers with a systemd service on host reboot #9210

Closed bitranox closed 3 years ago

bitranox commented 3 years ago

Required information

config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
  xxx
   -----END CERTIFICATE-----
  certificate_fingerprint: xxx
  driver: lxc | qemu
  driver_version: 4.0.10 | 6.1.0
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.11.0-34-generic
  lxc_features:
    cgroup2: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "21.04"
  project: default
  server: lxd
  server_clustered: false
  server_name: vmsrv3.local.rotek.at
  server_pid: 3174
  server_version: "4.18"
  storage: dir
  storage_version: "1"
  storage_supported_drivers:
  - name: btrfs
    version: 5.4.1
    remote: false
  - name: cephfs
    version: 15.2.13
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.43.0
    remote: false
  - name: zfs
    version: 2.0.2-1ubuntu5.1
    remote: false
  - name: ceph
    version: 15.2.13
    remote: true

Issue description

I can not stop LXC Containers with a service - always some errors like :

interner Fehler, bitte melden: Ausführung von "lxd.lxc" fehlgeschlagen: cannot create transient scope: DBus error "org.freedesktop.systemd1.TransactionIsDestructive": [Transaction for snap.lxd.lxc.8fc29d87-d0d3-478c-8594-50e2760a45da.scope/start is destructive (shutdown.target has 'start' job queued, but 'stop' is included in transaction).]

Steps to reproduce

just any service description does not work - nothing seem to work. I tried a lot of different Targets and Services, there seems to be always a conflict like described above.

[Unit]
Description=Rotek Servercontrol
########################################################################
# Manpage : https://www.freedesktop.org/software/systemd/man/index.html
########################################################################

########################################################################
# show units with: sudo systemctl list-units
# show targets with: sudo systemctl list-units --type target
# reload services with : sudo systemctl daemon-reload
########################################################################

After=NetworkManager-wait-online.service
After=network.target
After=networking.service
After=network-online.target
After=media-srv\x2dmain\x2dinstall.mount
After=media-srv\x2dbackupserver.mount
After=postfix.service
After=dnsmasq.service 
After=local-fs.target
After=snap.lxd.daemon.service
After=snap.lxd.daemon.unix.socket

########################################################################
# Wants = A weaker version of Requires=. Units listed in this option 
# will be started if the configuring unit is. However, if the listed 
# units fail to start or cannot be added to the transaction, this has no impact 
# on the validity of the transaction as a whole. 
# This is the recommended way to hook start-up of one unit to the start-up of another unit.
########################################################################
Wants=network.target
Wants=networking.service
Wants=network-online.target
Wants=postfix.service
Wants=dnsmasq.service 
Wants=local-fs.target
Wants=snap.lxd.daemon.service
Wants=snap.lxd.daemon.unix.socket
Wants=media-srv\x2dmain\x2dinstall.mount
Wants=media-srv\x2dbackupserver.mount

########################################################################
# Requires = Configures requirement dependencies on other units. 
# If this unit gets activated, the units listed here will be activated as well. 
# If one of the other units gets deactivated or its activation fails, 
# this unit will be deactivated. 
########################################################################

########################################################################
# Service Section 
########################################################################
[Service]
Type=oneshot
RemainAfterExit=yes

########################################################################
# TimeoutStopSec : Configures the time to wait for stop. If a service is asked to stop, 
# but does not terminate in the specified time, it will be terminated forcibly via SIGTERM
########################################################################
TimeoutStopSec=1200 

# this python script does basically : lxc info <container-name> and depending on the state 
# lxc-stop <containername>
# the error occurs on ExecStop when lxc info <container-name> is issued
ExecStart=-/opt/python3/bin/python3 /opt/rotek-apps/bin/servercontrol/servercontrol.py linux_startup_jobs
ExecStop=-/opt/python3/bin/python3 /opt/rotek-apps/bin/servercontrol/servercontrol.py linux_shutdown_jobs

######################################################################################
# INSTALL
######################################################################################
[Install]
WantedBy=multi-user.target

The question is, how to configure such a service so that lxc info <name> and lxc stop <name> can work on reboot/shutdown ?

stgraber commented 3 years ago

The systemd error seems to be coming from the fact that your script is being triggered through a stop action on the same systemd target as what snapd uses for the transient units created whenever a lxc command is run. This then causes systemd to fail with that error telling you that you're trying to run a start action as part of a stop operation and fail.

You may have some luck by moving dependencies or targets around to ensure your service triggers separately from what causes the LXD shutdown. Or you could have your python script use pylxd to bypass that systemd wrapper on the lxc tool (though you may still hit some issues as systemd manages connections on the unix socket and may still block that).

I'm also a bit confused as to what your service is doing. You say it's looking at lxc info and then doing lxc stop on reboot and shutdown. That's exactly what LXD itself does out of the box, so I'm not sure why any of this is needed in the first place :)

Closing as not a LXD issue, this would probably have been better handled as a support question on our forum, https://discuss.linuxcontainers.org. But happy to keep chatting about it here.

bitranox commented 3 years ago

Thanks Stephane - I opened the same issue on https://discuss.linuxcontainers.org. to discuss it there further ... thanks a lot Robert, Vienna