canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

Installing a VPN app breaks LXD #12306

Closed just-doks closed 1 year ago

just-doks commented 1 year ago

Required information

Issue description

After I followed all steps to install iVPN app on Debian (website link below, steps copied below), LXD containers no longer start, and giving an error (demo below).

iVPN link

Even if this app is full of crap, I want to know why it breaks only LXD containers? Is it LXD issue or iVPN really aims to destroy LXD? Is it not safe to use iVPN app?

Removing and reinstalling LXD or removing iVPN and it's installation files don't fix the damage. Only total OS reinstall.

Steps to reproduce

  1. Install Debian 12.1 - /boot on primary partition, /root and swap on LVM logical volumes, create logical volume for LXD storage pool or leave free space to create it later;
  2. Install LXD package: sudo apt install lxd;
  3. Install btrfs package: sudo apt install btrfs-progs;
  4. Run sudo lxd init command;
  5. Create managed network and btrfs storage pool on top of LVM logical volume (create new one before) during the init;
  6. Create and launch lxd container: sudo lxd launch images:debian/12 <container_name>;
  7. Stop container: sudo lxc stop <container_name>;
  8. Install iVPN following described steps for Debian - link;
  9. Reboot;
  10. Try to start the container: sudo lxc start <container_name>;
  11. Get an Error. Example:
    doks@debian:~$ sudo lxc start deb-c1
    Error: Failed to run: /usr/bin/lxd forkstart deb-c1 /var/lib/lxd/containers /var/log/lxd/deb-c1/lxc.conf: exit status 1
    Try `lxc info --show-log deb-c1` for more info

Information to attach

Log:

lxc deb-c1 20230923085712.134 ERROR cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_mount:2139 - No such file or directory - Failed to create cgroup at_mnt 24() lxc deb-c1 20230923085712.134 ERROR conf - ../src/lxc/conf.c:lxc_mount_auto_mounts:852 - No such file or directory - Failed to mount "/sys/fs/cgroup" lxc deb-c1 20230923085712.134 ERROR conf - ../src/lxc/conf.c:lxc_setup:4433 - Failed to setup remaining automatic mounts lxc deb-c1 20230923085712.134 ERROR start - ../src/lxc/start.c:do_start:1272 - Failed to setup container "deb-c1" lxc deb-c1 20230923085712.134 ERROR sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4) lxc deb-c1 20230923085712.144 WARN network - ../src/lxc/network.c:lxc_delete_network_priv:3631 - Failed to rename interface with index 0 from "eth0" to its initial name "vethcd714ea5" lxc deb-c1 20230923085712.144 ERROR lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING" lxc deb-c1 20230923085712.144 ERROR start - ../src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "deb-c1" lxc deb-c1 20230923085712.144 WARN start - ../src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 19 for process 2541 lxc 20230923085717.264 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response lxc 20230923085717.264 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_state"

---
 - ### The output of "lxc info":
``` bash
doks@debian:~$ sudo lxc info
[sudo] password for doks: 
config:
  images.auto_update_interval: "0"
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- vsock_api
- storage_volumes_all_projects
- projects_networks_restricted_access
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- cpu_hotplug
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIICAzCCAYqgAwIBAgIRAKM3l9ghK7x6AwrH0dXX8ecwCgYIKoZIzj0EAwMwNDEc
    MBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEUMBIGA1UEAwwLcm9vdEBkZWJp
    YW4wHhcNMjMwOTIyMTUyOTU4WhcNMzMwOTE5MTUyOTU4WjA0MRwwGgYDVQQKExNs
    aW51eGNvbnRhaW5lcnMub3JnMRQwEgYDVQQDDAtyb290QGRlYmlhbjB2MBAGByqG
    SM49AgEGBSuBBAAiA2IABFtQ18RJJj5h3sFaq61TkheDWEnl27aVy0RoOUz1KSK1
    BT/5jREGMPtldDvMDNx2TackLUT0ImGRkD7P5dluhMhPmnrYvKIqWjcPX93SDfZ6
    N4Osy3Ku17M34pzO8ePq8aNgMF4wDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoG
    CCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwKQYDVR0RBCIwIIIGZGViaWFuhwR/AAAB
    hxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2cAMGQCMEokwba3H2TYvpO9
    DhqSvG+EHJ7yYaXhh+YfG76rG34rydB7sg2g3MHOUu1+5PoBQAIwUxL1kD/9TVgQ
    /dz+w3vy8EfJP+GCp/zUmgnSwH1VMCA+MDz0Y7W1UqEMcEDumkTN
    -----END CERTIFICATE-----
  certificate_fingerprint: 5ae6380564bfa2dccaf35490814d967212532927568de0ce903908273029cfe0
  driver: lxc
  driver_version: 5.0.2
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 6.1.0-12-amd64
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Debian GNU/Linux
  os_version: "12"
  project: default
  server: lxd
  server_clustered: false
  server_event_mode: full-mesh
  server_name: debian
  server_pid: 2579
  server_version: 5.0.2
  storage: btrfs
  storage_version: "6.2"
  storage_supported_drivers:
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.47.0
    remote: false
  - name: btrfs
    version: "6.2"
    remote: false

simondeziel commented 1 year ago

@just-doks that seems very similar to a known issue with Mullvad and PIA VPN clients, please see https://documentation.ubuntu.com/lxd/en/latest/faq/#why-does-starting-containers-suddenly-fail for a workaround. If the workaround doesn't work, please re-open the issue.

just-doks commented 1 year ago

@simondeziel unfortunately, it works until reboot. It's not a solution, but a one-time method. How to fix this completely? I removed iVPN app, but the problem persists. I don't understand mechanisms of cgroup or vpn app, I have no idea what's happening inside system, should I remove some service that mounts something or should I delete some hidden app? Is there a better solution other than "make systemd unit"? If clean system works fine, then there must be something we can change without on-boot services, total OS reinstalling and avoiding vpn clients.

just-doks commented 1 year ago

Damn it ok, I found that removing the iVPN package did not remove the systemd service, so I removed it and LXD containers started running after reboots. But this was not said anywhere. And I'm still losing my VPN client. At least now I understand how to roll back the problem entirely. I'll try to apply the advice for mullvad to my client. But this is nonsense that under Linux everyone must be a programmer and correct the system files of programs so that they work with each other. Is there anything You (LXD maintainers) can change in LXD to avoid this issue for all VPN clients?

just-doks commented 1 year ago

@simondeziel I followed all steps from linuxcontainers forum's topic but it doesn't work. Also, deleting systemd unit was completely wrong idea as reinstalling vpn app doesn't return it. Where's re-open issue button?