canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.37k stars 930 forks source link

Guest has no networking for over five minutes #13806

Closed basak closed 3 months ago

basak commented 3 months ago

Required information

Name    Version        Rev    Tracking       Publisher   Notes
core20  20240416       2318   latest/stable  canonical✓  base
lxd     4.0.9-a29c6f1  24061  4.0/stable/…   canonical✓  -
snapd   2.63           21759  latest/stable  canonical✓  snapd
config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- resources_system
- usedby_consistency
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- storage_rsync_compression
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_state_vlan
- gpu_sriov
- migration_stateful
- disk_state_quota
- storage_ceph_features
- gpu_mig
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- network_counters_errors_dropped
- image_source_project
- database_leader
- instance_all_projects
- ceph_rbd_du
- qemu_metrics
- gpu_mig_uuid
- event_project
- instance_allow_inconsistent_copy
- image_restrictions
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIICDTCCAZOgAwIBAgIRAOXahcfYtFH8jfU8YOuk7W0wCgYIKoZIzj0EAwMwNzEc
    MBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEXMBUGA1UEAwwOcm9vdEByYmFz
    YWstZ3UwHhcNMjQwNzE3MTI1MDQ0WhcNMzQwNzE1MTI1MDQ0WjA3MRwwGgYDVQQK
    ExNsaW51eGNvbnRhaW5lcnMub3JnMRcwFQYDVQQDDA5yb290QHJiYXNhay1ndTB2
    MBAGByqGSM49AgEGBSuBBAAiA2IABFf7GfV68UQKmYTy8xt18QbYEft9M6GrNntW
    dOJxfQ7bvFovAl7LZVlNpQBjkFaJMvIBmSAQ269LGhHU6N8Qu1cphqMfdsJPINfy
    NkjDgZi9O4TkQDc3nMrvbyOxi/w+h6NjMGEwDgYDVR0PAQH/BAQDAgWgMBMGA1Ud
    JQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwLAYDVR0RBCUwI4IJcmJhc2Fr
    LWd1hwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2gAMGUCMQDg
    bYwCNXcFr+bxykiQWkNX0G2kyc4V/IDBeIbTDVxI81UQOf+FduENoQmgitZ78q4C
    MHqQBUeZt7HcwNaXqmmwgoLsqqx2evkeqckTYbP3uB6xkC1tkt9Iz3K8038keVtl
    aA==
    -----END CERTIFICATE-----
  certificate_fingerprint: 0af2558a40594017a40c81ec6b4b5e60ec13db79bb7eb977c24031a0ca0aa1fd
  driver: lxc | qemu
  driver_version: 4.0.12 | 7.1.0
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.4.0-189-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "20.04"
  project: default
  server: lxd
  server_clustered: false
  server_name: rbasak-gu
  server_pid: 16495
  server_version: 4.0.9
  storage: dir
  storage_version: "1"
  storage_supported_drivers:
  - name: lvm
    version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.41.0
    remote: false
  - name: zfs
    version: 0.8.3-1ubuntu12.17
    remote: false
  - name: ceph
    version: 15.2.17
    remote: true
  - name: btrfs
    version: 5.4.1
    remote: false
  - name: cephfs
    version: 15.2.17
    remote: true
  - name: dir
    version: "1"
    remote: false

Issue description

When I boot images:debian/sid (image fingerprint 67e221357f18), networking does not run in the guest. I see local link IPv6 addresses but no IPv4 address configured. I assume it's supposed to work with networkd since /etc/systemd/network/eth0.network exists, but systemd-networkd is not running.

I tried the same thing on a 24.04 host with lxd snap 6.1-0d4d89b and it works fine, so I assume this is an issue either with the older lxd or the older host OS.

A brief description of the problem. Should include what you were attempting to do, what you did, what happened and what you expected to see happen.

Steps to reproduce

  1. Configure the default profile with security.nesting=true (I'm using this because I'm also trying to use Oracular containers on this host system.
  2. lxc launch images:67e221357f18 ns
  3. lxc exec ns bash
  4. ip a, wait a while, retry, etc.

Expected results: IPv4 configured. Actual results: IPv4 is not configured.

NOTE: taking other actions inside the container seems to result in socket activation of systemd-networkd, and that fixes things. So does waiting about five minutes. In my test, systemd status systemd-networkd reports that the service started 6m43s after I started the instance. My issue is that automated use expects networking in the container to come up promptly with no further action.

Information to attach

Jul 22 11:35:51 rbasak-gu kernel: [427908.488584] lxdbr0: port 1(veth92757f2f) entered blocking state
Jul 22 11:35:51 rbasak-gu kernel: [427908.488585] lxdbr0: port 1(veth92757f2f) entered disabled state
Jul 22 11:35:51 rbasak-gu kernel: [427908.491402] device veth92757f2f entered promiscuous mode
Jul 22 11:35:51 rbasak-gu kernel: [427908.708017] audit: type=1400 audit(1721648151.993:5549): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd-ns_</var/snap/lxd/common/lxd>" pid=144136 comm="apparmor_parser"
Jul 22 11:35:52 rbasak-gu kernel: [427908.739468] phys9Vej0T: renamed from vethb4e2977f
Jul 22 11:35:52 rbasak-gu kernel: [427908.778049] eth0: renamed from phys9Vej0T
Jul 22 11:35:52 rbasak-gu kernel: [427908.779367] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jul 22 11:35:52 rbasak-gu kernel: [427908.779397] lxdbr0: port 1(veth92757f2f) entered blocking state
Jul 22 11:35:52 rbasak-gu kernel: [427908.779399] lxdbr0: port 1(veth92757f2f) entered forwarding state
Name: ns
Location: none
Remote: unix://
Architecture: x86_64
Created: 2024/07/22 11:35 UTC
Status: Running
Type: container
Profiles: default
Pid: 144137
Ips:
  eth0: inet    10.69.70.168    veth92757f2f
  eth0: inet6   fd42:6955:2443:e5c3:216:3eff:fe7e:7a76  veth92757f2f
  eth0: inet6   fe80::216:3eff:fe7e:7a76    veth92757f2f
  lo:   inet    127.0.0.1
  lo:   inet6   ::1
Resources:
  Processes: 8
  CPU usage:
    CPU usage (in seconds): 5
  Memory usage:
    Memory (current): 120.77MB
    Memory (peak): 182.50MB
  Network usage:
    eth0:
      Bytes received: 52.12kB
      Bytes sent: 64.24kB
      Packets received: 387
      Packets sent: 565
    lo:
      Bytes received: 0B
      Bytes sent: 0B
      Packets received: 0
      Packets sent: 0

Log:

lxc ns 20240722113552.621 WARN     conf - conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc ns 20240722113552.627 WARN     conf - conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc ns 20240722113552.961 WARN     conf - conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc ns 20240722113552.965 WARN     conf - conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc ns 20240722113552.100 WARN     cgfsng - cgroups/cgfsng.c:fchowmodat:1252 - No such file or directory - Failed to fchownat(40, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc ns 20240722120706.132 WARN     conf - conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc ns 20240722120706.132 WARN     conf - conf.c:lxc_map_ids:3598 - newgidmap binary is missing
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Debian sid amd64 (20240722_0002)
  image.os: Debian
  image.release: sid
  image.serial: "20240722_0002"
  image.type: squashfs
  image.variant: default
  security.nesting: "true"
  volatile.base_image: 67e221357f182d0f73c6e8ba1971d2d9dd8b18237c31ffbb959ac40eb9f43092
  volatile.eth0.host_name: veth92757f2f
  volatile.eth0.hwaddr: 00:16:3e:7e:7a:76
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 3b457acc-4747-46f5-b626-faa6b0e756ce
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

The only relevant line seems to be:

t=2024-07-22T11:35:52+0000 lvl=info msg="Started container" action=start created=2024-07-22T11:35:42+0000 ephemeral=false instance=ns instanceType=container project=default stateful=false used=1970-01-01T00:00:00+0000

I've skipped the following since it's trivially reproducible on a fresh Focal VM.

tomponline commented 3 months ago

LXD 4.0.x is currently in security maintenance mode only. It won't be getting updates to support newer guest containers I'm afraid. Please can you switch to the latest LTS which is 5.21/stable.

tomponline commented 3 months ago

Its could be an issue with cgroupv2 not being present on the host, which the newer image expects.

basak commented 3 months ago

It won't be getting updates to support newer guest containers I'm afraid.

In that case, any chance we can predict the failure and refuse to launch such an image please, rather than fail in unpredictable ways afterwards?

Please can you switch to the latest LTS which is 5.21/stable.

I can do that, but I'm trying to use lxd to implement stable builds for git-ubuntu users, and I can't control what lxd they are running or change it. Is there any way I can detect that it's not going to work in advance, please, so I can give the user a useful error message?

tomponline commented 3 months ago

In that case, any chance we can predict the failure and refuse to launch such an image please, rather than fail in unpredictable ways afterwards?

That would require feature development which is no longer occurring for the 4.0.x series.

We plan one more release to update the default remotes to the new image server and then its critical security fixes only.

tomponline commented 3 months ago

I can do that, but I'm trying to use lxd to implement stable builds for git-ubuntu users, and I can't control what lxd they are running or change it. Is there any way I can detect that it's not going to work in advance, please, so I can give the user a useful error message?

Because containers share the host kernel and the cgroup layout, its a function of the host + container guest as to whether they function together. The big change was the switch to unified cgroups (cgroupv2) that mean running newer guests on older hosts upset some systemd services because they heavily rely on cgroups.

Does ubuntu:focal work for you on a Focal host with LXD 4.0?

basak commented 3 months ago

Does ubuntu:focal work for you on a Focal host with LXD 4.0?

Yes that works fine. But it doesn't solve the general case of wanting to use lxd in defaults on (eg) an LTS release with the Ubuntu development release in a container to do builds.

I'll file a separate issue for the general case - thanks.