canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 926 forks source link

stopping/deleting a freshly created container can fail #10083

Closed simondeziel closed 1 year ago

simondeziel commented 2 years ago

Required information

On Ubuntu 20.04.4 with LXD from snap (4.24 rev 22674)

$ lxc info
config:
  core.https_address: 10.98.30.1
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses:
  - 10.98.30.1:8443
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    -----END CERTIFICATE-----
  certificate_fingerprint: 07f26135c114ba171f0204258b0b54df9e46461bf793b99ccb154bb40e8d0697
  driver: lxc | qemu
  driver_version: 4.0.12 | 6.1.1
  firewall: xtables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "false"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.13.0-35-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "20.04"
  project: default
  server: lxd
  server_clustered: false
  server_event_mode: full-mesh
  server_name: sdeziel-lemur
  server_pid: 192964
  server_version: "4.24"
  storage: zfs
  storage_version: 2.0.6-1ubuntu2
  storage_supported_drivers:
  - name: cephfs
    version: 15.2.14
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.45.0
    remote: false
  - name: zfs
    version: 2.0.6-1ubuntu2
    remote: false
  - name: ceph
    version: 15.2.14
    remote: true
  - name: btrfs
    version: 5.4.1
    remote: false

Issue description

Creating a container and immediately stopping or deleting it fails:

$ lxc launch focal t && lxc stop -f t
Creating t
Starting t
Error: Failed unmounting instance: In use
Try `lxc info --show-log t` for more info

$ lxc info --show-log t
Name: t
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2022/03/18 09:48 EDT
Last Used: 2022/03/18 09:48 EDT

Log:

lxc t 20220318134836.579 WARN     conf - conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc t 20220318134836.579 WARN     conf - conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc t 20220318134836.580 WARN     conf - conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc t 20220318134836.580 WARN     conf - conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc t 20220318134836.580 WARN     cgfsng - cgroups/cgfsng.c:fchowmodat:1252 - No such file or directory - Failed to fchownat(42, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc t 20220318134837.695 WARN     conf - conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc t 20220318134837.695 WARN     conf - conf.c:lxc_map_ids:3598 - newgidmap binary is missing

Similar issue when deleting:

$ lxc launch focal t2 && lxc delete -f t2
Creating t2
Starting t2
Error: Stopping the instance failed: Failed unmounting instance: In use

After such failure, a subsequent lxc stop or lxc delete works.

tomponline commented 2 years ago

What storage pool is this, I've not been able to recreate on ZFS loop backed on NVME.

tomponline commented 2 years ago

If I run lxc stop -f t in a loop in one window and then launch a fresh container t in another I see:

Error: Not Found
Error: Not Found
Error: Not Found
Error: Not Found
Error: Not Found
Error: Not Found
Error: Not Found
Error: Not Found
Error: Not Found
Error: Not Found
Error: Not Found
Error: Not Found
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: Instance is busy running a "start" operation
Try `lxc info --show-log t` for more info
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped
Error: The instance is already stopped

Which is what I would expect to see.

simondeziel commented 2 years ago

It's on a zpool backed by a NVME partition.

$ lxc storage list
+---------+--------+---------+-------------+---------+---------+
|  NAME   | DRIVER | SOURCE  | DESCRIPTION | USED BY |  STATE  |
+---------+--------+---------+-------------+---------+---------+
| default | zfs    | default |             | 13      | CREATED |
+---------+--------+---------+-------------+---------+---------+
$ lxc storage show default
config:
  source: default
  volatile.initial_source: /dev/nvme0n1p6
  zfs.pool_name: default
description: ""
name: default
driver: zfs
used_by:
- /1.0/images/39c55b257dc40e812cc1431e04eb1c883b03c349b67fbffb3300d76e42c36176
- /1.0/images/45e0432a28176ae29a29869cd667f87f633c00ef819804ecfb721f82c5acc995?project=charmcraft
- /1.0/images/6abc292fe3f02ad2218127926cda5dba31c6f2f0f21fc929157a2c7051c16f17
- /1.0/images/ae2a19ee7bc7f3b21cf7e529c51c5138ae1b99cbd0186265726c4265c1827303
...
- /1.0/profiles/default
- /1.0/profiles/default?project=charmcraft
status: Created
locations:
- none
tomponline commented 2 years ago

Must be extremely fast as I cannot reproduce on my NVME ZFS partition either :(

simondeziel commented 2 years ago

Extremely fast outta be relative as the machine is from 2016 ;)

tomponline commented 2 years ago

Extremely fast outta be relative as the machine is from 2016 ;)

That new ;)

tomponline commented 2 years ago

I can theorise that there is a very small race potential here:

https://github.com/lxc/lxd/blob/42bd6dcf91aa3fb0ae4a2ece1ff9c0c1fadd6a98/lxd/instance/operationlock/operationlock.go#L119-L126

Called from Start():

https://github.com/lxc/lxd/blob/master/lxd/instance/drivers/driver_lxc.go#L2355-L2369

and Stop():

https://github.com/lxc/lxd/blob/master/lxd/instance/drivers/driver_lxc.go#L2614-L2624

Such that both Start and Stop both think there is no existing operation and both call Create.

tomponline commented 2 years ago

Although the lock in Create should still catch it:

https://github.com/lxc/lxd/blob/42bd6dcf91aa3fb0ae4a2ece1ff9c0c1fadd6a98/lxd/instance/operationlock/operationlock.go#L59-L74

Which is what I see in my tests:

Error: Instance is busy running a "start" operation
simondeziel commented 2 years ago

I'm happy to run any debug binary you feed me or give you SSH access (to my museum piece) if that's more convenient ;)

tomponline commented 2 years ago

Can you get me the debug log output when you see this happening please.

kamzar1 commented 2 years ago

Not sure, it is related, as all attention here goes toward storage. Sounds strange, but an immediate stopping or deleting instance after creation failed in my case, as long as instance was struggling to obtain IP from DHCP server.

simondeziel commented 2 years ago

I cannot reproduce after snap set lxd daemon.debug=true and systemctl restart snap.lxd.daemon :/

simondeziel commented 2 years ago

After unset daemon.debug plus lxd restart, I can no longer reproduce it.

simondeziel commented 2 years ago

Whatever it was, I can no longer reproduce it. I will reopen if it happens again. Thanks Tom.

Spunge commented 2 years ago

I just got into a state where this happens constistently. This happened after i was creating => starting => destroying containers in an automated testsuite. I'm not sure what gets me into this state, but restarting the LXD daemon "fixes" this issue.

$> lxc launch testing testing && lxc delete -f testing
Creating testing
Starting testing
Error: Stopping the instance failed: Failed unmounting instance: In use

It will show this error, but the instance will still be deleted.

Sometimes i can see this error in the logs, maybe it's of use:

Every 1.0s: lxc info testing --show-log                                                             balance: Thu Sep  8 23:03:02 2022

Name: testing
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2022/09/08 23:02 CEST
Last Used: 2022/09/08 23:02 CEST

Log:

lxc 20220908210259.562 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220908210259.562 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_state"

I'm running arch with with LXD community/lxd 5.5-1 with a dir storage pool.

I've also noticed that LXD is not cleaning up network interfaces anymore when i get into this state.

Annoyingly i still haven't found a reliable way to get into this state, it seems to be related with calls to the /1.0/instances/{name}/state endpoint but i'm not sure. I will investigate further tomorrow.

Spunge commented 2 years ago

I can now reproduce this, it seems to be caused by starting an instance while at the same time executing some other operation on it?

I'm testing this with a default profile that only contains a root disk.

This executes succesfully:

 ~ lxc init images:alpine/edge c && lxc start c; lxc exec c hostname; lxc delete -f c
Creating c

The instance you are starting doesn't have any network attached to it.
  To create a new network, use: lxc network create
  To attach a network to an instance, use: lxc network attach

c

But if i execute:

 ~ lxc init images:alpine/edge c && lxc start c & lxc exec c hostname; lxc delete -f c
Creating c

The instance you are starting doesn't have any network attached to it.
  To create a new network, use: lxc network create
  To attach a network to an instance, use: lxc network attach

[1] 2384
Error: Instance is not running
 ~ Error: Failed to run: /usr/bin/lxd forkstart c /var/lib/lxd/containers /var/log/lxd/c/lxc.conf: exit status 1
Try `lxc info --show-log c` for more info

[1]  + 2384 exit 1     lxc start c

Then after this, when i execute the first command, i get this Failed unmounting instance: In use error:

 ~  lxc init images:alpine/edge c && lxc start c; lxc exec c hostname; lxc delete -f c
Creating c

The instance you are starting doesn't have any network attached to it.
  To create a new network, use: lxc network create
  To attach a network to an instance, use: lxc network attach

c
Error: Stopping the instance failed: Failed unmounting instance: In use

The c container logs:

Name: c
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2022/09/09 16:09 CEST
Last Used: 2022/09/09 16:09 CEST

Log:

lxc c 20220909140914.685 WARN     attach - ../src/lxc/attach.c:get_attach_context:477 - No security context eceived
tomponline commented 2 years ago

Thats helpful thanks, and would explain why the mount is being left over as it would be triggered by exec when the instance wasn't running. I'll try and reproduce this and fix.

tomponline commented 1 year ago

Working on this now.

tomponline commented 1 year ago

Is this reproducible on LXD 5.6 or LXD 5.0.1?

tomponline commented 1 year ago

I'm running arch with with LXD community/lxd 5.5-1 with a dir storage pool.

Ah just seen.

tomponline commented 1 year ago

OK I've reproduced this now.