canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

lxd init fails with it's own preseed generated file #10478

Closed token47 closed 2 years ago

token47 commented 2 years ago

Issue description

Cannot preseed a lxd init for cluster and having a storage_pool at the same time.

Steps to reproduce

  1. Manually init a lxd system, answer YES to create a new cluster, be sure to create a new storagepool.
  2. At the end, collect the YAML output for preseeding
  3. Wipe the whole lxd config and start again, this time using the preseed generated above

The generated preseed will be something like:

config:
  core.https_address: 172.27.77.69:8443
  core.trust_password: supersecretpassword
networks: []
storage_pools:
- config:
    size: 400GB
  description: ""
  name: local
  driver: btrfs
profiles:
- config: {}
  description: ""
  devices:
    root:
      path: /
      pool: local
      type: disk
  name: default
projects: []
cluster:
  server_name: ob76-node5.maas
  enabled: true
  member_config: []
  cluster_address: ""
  cluster_certificate: ""
  server_address: ""
  cluster_password: ""
  cluster_certificate_path: ""
  cluster_token: ""

NOTE: this was automatically generated and has no editing on my part.

Then when you try to use this later, you get:

ubuntu@ob76-node5:~$ cat preseed.yaml | sudo lxd init --preseed
Error: Failed to create storage pool "local": Config key "size" is cluster member specific

If I remove the "size" parameter from the storage pool section, and move it to the "member_config" section using something like:

  - entity: storage-pool
    key: size
    name: default
    value: 400GiB

Then it instead complains:

ubuntu@ob76-node5:~$ cat preseed.yaml | sudo lxd init --preseed
Error: Failed to create storage pool "default": Pool not pending on any node (use --target <node> first)

Which also does not make any sense because I have not initialized the cluster yet.

What is the correct way of doing it? And why does the produced preseed does not work, is it a bug?

Here is the "lxc info":

config:
  cluster.https_address: 172.27.77.69:8443
  core.https_address: 172.27.77.69:8443
  core.trust_password: true
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses:
  - 172.27.77.69:8443
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIICEDCCAZWgAwIBAgIQVWFEPCutnx/YbLIJwl0A0DAKBggqhkjOPQQDAzA4MRww
    GgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRgwFgYDVQQDDA9yb290QG9iNzYt
    bm9kZTUwHhcNMjIwNTI5MTk1MzM1WhcNMzIwNTI2MTk1MzM1WjA4MRwwGgYDVQQK
    ExNsaW51eGNvbnRhaW5lcnMub3JnMRgwFgYDVQQDDA9yb290QG9iNzYtbm9kZTUw
    djAQBgcqhkjOPQIBBgUrgQQAIgNiAAR4c9UR0hQzuKZBfCsMnBbpYGHIE9MB+Ma9
    jqydgRF5b3OwWH+XLw102D8nK9pNUsAR9btj/50F3zaPrtVPDrx8sKsDxtB94eo+
    c+PCEwtOte9GK72CzNlH4eBfJdyE63CjZDBiMA4GA1UdDwEB/wQEAwIFoDATBgNV
    HSUEDDAKBggrBgEFBQcDATAMBgNVHRMBAf8EAjAAMC0GA1UdEQQmMCSCCm9iNzYt
    bm9kZTWHBH8AAAGHEAAAAAAAAAAAAAAAAAAAAAEwCgYIKoZIzj0EAwMDaQAwZgIx
    AJ7+OBW69LSVyFDGxBucftXM/5GRQBmphrbya9Iis6OVyQBKeiDuV4YS23cyudqI
    UAIxAJLiF1QIhhDTNEi4s6epnCd0orp4GK4pRcYnTaZ/fkPAiDYjEyaUaNIME+7o
    ooGtLA==
    -----END CERTIFICATE-----
  certificate_fingerprint: 3541ca7379188b879b26faf41c7717e57b4f8c1dc02010f366bb66c65b6992a0
  driver: lxc | qemu
  driver_version: 4.0.12 | 6.1.1
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.15.0-33-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "22.04"
  project: default
  server: lxd
  server_clustered: true
  server_event_mode: full-mesh
  server_name: ob76-node5.maas
  server_pid: 22907
  server_version: "5.1"
  storage: ""
  storage_version: ""
  storage_supported_drivers:
  - name: zfs
    version: 2.1.2-1ubuntu3
    remote: false
  - name: ceph
    version: 15.2.16
    remote: true
  - name: btrfs
    version: 5.4.1
    remote: false
  - name: cephfs
    version: 15.2.16
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.45.0
    remote: false
token47 commented 2 years ago

Ok, I got it working after completely removing LXD (snap remove) and reinstalling. Then the original preseed worked (with the size parameter in the storagepool itself and not as a member_config).

I had read in the documentation (or maybe discord?) that you could just init again over and it would replace everything. I even wiped all possible configs using [1]. Seems like all that is not enough.

Is this expected behavior? There was something left behind that was causing the issue, even after completely wiping the storage part. This seems wrong (or at least very unintuitive) to me.

Anyway, this can be closed if it seems like expected behavior.

Thank you

[1] - https://github.com/lxc/lxd/blob/master/scripts/empty-lxd.sh

stgraber commented 2 years ago

Ah yeah, empty-lxd will not un-cluster, there is no real way to undo the clustering step in LXD other than a reinstall.

The empty-lxd script most likely did a good job at wiping everything clean except for the cluster state, which then caused the error.

If you need to script it, you could do:

Which should let you effectively wipe all of LXD clean without actually reinstalling LXD or rebooting.

token47 commented 2 years ago

Thank you!