ledlamp commented 1 year ago

Required information

Distribution: Ubuntu
Distribution version: 22.04.1 LTS

The output of "lxc info" or if that fails:

config:
images.auto_update_interval: "0"
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: root
auth_user_method: unix
environment:
addresses: []
architectures:
- x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
MIICBTCCAYqgAwIBAgIRAMLtDk1V9Uc5K8YdtcsByZowCgYIKoZIzj0EAwMwNDEc
MBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEUMBIGA1UEAwwLcm9vdEBzZXJ2
ZXIwHhcNMjMwNjI4MjMzMzEzWhcNMzMwNjI1MjMzMzEzWjA0MRwwGgYDVQQKExNs
aW51eGNvbnRhaW5lcnMub3JnMRQwEgYDVQQDDAtyb290QHNlcnZlcjB2MBAGByqG
SM49AgEGBSuBBAAiA2IABFNy47IrI5NcFnBfkHMmvh/0qG0cK9s4AEGYhOlKm9AT
88JKys484z1SQ6fmwOoydWn+m4bO0BrpM2DERzDvDHRL+am4eyAAA25eSyDiY8vw
S8jBrgZXl+y7Xgdfj1mq3aNgMF4wDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoG
CCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwKQYDVR0RBCIwIIIGc2VydmVyhwR/AAAB
hxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2kAMGYCMQCqruZVv+j5burX
dGs+7IC4SnzW/VycM5NQKa77Us85MRGj88WTO4cgZ9eUwc+TxY8CMQCMhtJJiZJN
nvoJrloy4gkY/enfF4HKeCFZcQi9pFuYaFq9bO9Ir0zBePekXHasioM=
-----END CERTIFICATE-----
certificate_fingerprint: 6328dbcdbe459cc022544936ade2bfcfacff27197b49e6a6e65ae567c7c9dbb8
driver: lxc | qemu
driver_version: 5.0.2 | 8.0.0
firewall: nftables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
idmapped_mounts: "true"
netnsid_getifaddrs: "true"
seccomp_listener: "true"
seccomp_listener_continue: "true"
shiftfs: "false"
uevent_injection: "true"
unpriv_fscaps: "true"
kernel_version: 5.15.0-75-generic
lxc_features:
cgroup2: "true"
core_scheduling: "true"
devpts_fd: "true"
idmapped_mounts_v2: "true"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
pidfd: "true"
seccomp_allow_deny_syntax: "true"
seccomp_notify: "true"
seccomp_proxy_send_notify_fd: "true"
os_name: Ubuntu
os_version: "22.04"
project: default
server: lxd
server_clustered: false
server_event_mode: full-mesh
server_name: server
server_pid: 28554
server_version: "5.15"
storage: dir
storage_version: "1"
storage_supported_drivers:
- name: cephobject
version: 17.2.5
remote: true
- name: dir
version: "1"
remote: false
- name: lvm
version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.45.0
remote: false
- name: zfs
version: 2.1.5-1ubuntu6~22.04.1
remote: false
- name: btrfs
version: 5.16.2
remote: false
- name: ceph
version: 17.2.5
remote: true
- name: cephfs
version: 17.2.5
remote: true

Issue description

LXD is creating an lxdbr0 interface and reconfiguring the default profile if it does not exist on restart, even if it is not wanted, even if it was not asked for during lxd init.

Steps to reproduce

Install lxd: snap install lxd
Run lxd init but use an existing bridge (i.e. lxdbr1) instead of creating a new one:
Check lxc network ls and lxc profile show default. There is no lxdbr0 and the profile is configured as desired.
Restart lxd: snap restart lxd
Check again lxc network ls and lxc profile show default. A new lxdbr0 now exists and the default profile was changed to use it. The user is now very angry.

Information to attach

[ ] Any relevant kernel output (dmesg)
[ ] Container log (lxc info NAME --show-log)
[ ] Container configuration (lxc config show NAME --expanded)
[ ] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
[ ] Output of the client with --debug
[ ] Output of the daemon with --debug (alternatively output of lxc monitor while reproducing the issue)

ledlamp commented 1 year ago

so to deal with it (as a work-around), there has to be an interface called lxdbr0, either leave the unneeded one it creates, or name your unmanaged interface lxdbr0 if you can. otherwise it'll create it and mess up your default profile

tomponline commented 1 year ago

Confirmed issue. This is really odd.

I've confirmed that killing the lxd process after initializing it with the existing lxdbr1 interface and then triggering the process to be restarted by running lxc ls doesn't create it. So I think we can rule this out as an actual LXD issue, but rather an external or packaging issue.

Additionally doing snap stop lxd and then snap start lxd doesn't trigger it either.

Further more its even easier to reproduce:

snap install lxd
lxc network ls # No lxdbr0
snap restart lxd
lxc network ls # Shows lxdbr0 managed network

So its something triggered from snap restart lxd.

I've also confirmed that we can actually see an API request coming into LXD upon snap restart lxd that inspects the existing networks and creates lxdbr0:

# Getting network list
Jun 29 06:58:13 vtest lxd.daemon[9910]: time="2023-06-29T06:58:13Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url="/1.0/networks?recursion=1" username=root
Jun 29 06:58:13 vtest lxd.daemon[9910]: time="2023-06-29T06:58:13Z" level=debug msg="WriteJSON\n\t{\n\t\t\"type\": \"sync\",\n\t\t\"status\": \"Success\",\n\t\t\"status_code\": 200,\n\t\t\"operation\": \"\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\": [\n\t\t\t{\n\t\t\t\t\"config\": {},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"lo\",\n\t\t\t\t\"type\": \"loopback\",\n\t\t\t\t\"used_by\": [],\n\t\t\t\t\"managed\": false,\n\t\t\t\t\"status\": \"\",\n\t\t\t\t\"locations\": null\n\t\t\t},\n\t\t\t{\n\t\t\t\t\"config\": {},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"enp5s0\",\n\t\t\t\t\"type\": \"physical\",\n\t\t\t\t\"used_by\": null,\n\t\t\t\t\"managed\": false,\n\t\t\t\t\"status\": \"\",\n\t\t\t\t\"locations\": null\n\t\t\t},\n\t\t\t{\n\t\t\t\t\"config\": {},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"lxdbr1\",\n\t\t\t\t\"type\": \"bridge\",\n\t\t\t\t\"used_by\": null,\n\t\t\t\t\"managed\": false,\n\t\t\t\t\"status\": \"\",\n\t\t\t\t\"locations\": null\n\t\t\t}\n\t\t]\n\t}" http_code=200

# Create lxdbr0 request
Jun 29 06:58:13 vtest lxd.daemon[9910]: time="2023-06-29T06:58:13Z" level=debug msg="Handling API request" ip=@ method=POST protocol=unix url=/1.0/networks username=root
Jun 29 06:58:13 vtest lxd.daemon[9910]: time="2023-06-29T06:58:13Z" level=debug msg="API Request\n\t{\n\t\t\"config\": {},\n\t\t\"description\": \"\",\n\t\t\"name\": \"lxdbr0\",\n\t\t\"type\": \"bridge\"\n\t}" ip=@ method=POST protocol=unix url=/1.0/networks username=root

tomponline commented 1 year ago

FWIW we dont tend to recommend using snap restart lxd because it will stop any running instances, instead we tend to use:

sudo systemctl reload snap.lxd.daemon

Which just restarts the running LXD daemon and not the instances.

This doesn't appear to trigger the issue either.

tomponline commented 1 year ago

@stgraber any ideas here, im at a bit of a loss. My only guess is that its something to do with either lxd-user or lxd-migrate (the lxd-migrate inside the snap that migrates from apt package) as that does create a bridge.

I also observe, but not sure if relevant this in the logs:

Jun 29 07:29:44 vtest audit[3869]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.lxd.migrate" pid=3869 comm="apparmor_parser"
Jun 29 07:29:49 vtest audit[3966]: AVC apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.migrate" pid=3966 comm="apparmor_parser"

Suggesting it may be being run.

There is something that is create a lxdbr0 bridge on snap restart lxd if there are not managed networks exist. It then goes on to add/replace an eth0 NIC device connected to that network to the default profile.

tomponline commented 1 year ago

Speaking with @stgraber he confirmed this is a bug in the snap restart command as it starts sub-units (like the lxd-user process) even if it wasn't previously running (because normally it starts by socket activation).

@gabrielmougard please can you open a bug with the snapd team for this https://bugs.launchpad.net/snapd/+filebug ?

Thanks

tomponline commented 1 year ago

@ru-fu @gabrielmougard we should change the reference in the docs to snap restart --reload lxd to snap restart --reload lxd.daemon so we don't instruct users on discovering this external bug in snapd.

gabrielmougard commented 1 year ago

@ru-fu I don't see any mentions of snap restart --reload lxd in our doc, but there is the Install LXD from a package section here. Should we add a Restart a snap LXD deployment sub-title below this section ?

tomponline commented 1 year ago

I believe @ru-fu fixed it already

ru-fu commented 1 year ago

I fixed the snap restart occurrences, yes. It might be a good idea to add a section about how to restart LXD. But I'm not sure if the installing page is the best place for it ... Is there a common scenario where you need to restart LXD? Maybe after server config changes?

canonical / lxd

LXD creates lxdbr0 if it does not exist, despite being initialized without it. #11906

Required information

Issue description

Steps to reproduce

Information to attach