Closed debug-richard closed 2 years ago
Please can you remove the symlink you created and then show the output of:
sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- ls -la /var/snap/lxd/common/lxd/storage-pools/data-nvme/containers/old_container/rootfs/var/log/journal
The path shows nothing after deleting the symlink.
If I create a new pool for testing this is the result:
lxc storage create test btrfs source=/media/data-nvme/lxdtest/
sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- ls -la /var/snap/lxd/common/lxd/storage-pools/test
total 0
drwx--x--x 1 root root 0 Apr 6 12:48 .
drwx--x--x 1 root root 50 Apr 6 12:48 ..
ls -la /media/data-nvme/lxdtest/
total 16
drwxr-xr-x 1 root root 200 Apr 6 12:48 .
drwxr-xr-x 1 build build 122 Jun 28 2021 ..
drwx--x--x 1 root root 0 Apr 6 12:48 containers
drwx--x--x 1 root root 0 Apr 6 12:48 containers-snapshots
drwx--x--x 1 root root 0 Apr 6 12:48 custom
drwx--x--x 1 root root 0 Apr 6 12:48 custom-snapshots
drwx--x--x 1 root root 0 Apr 6 12:48 images
drwx--x--x 1 root root 0 Apr 6 12:48 virtual-machines
drwx--x--x 1 root root 0 Apr 6 12:48 virtual-machines-snapshots
It seems that lxd does not mount/link/see the storage pool.
Regarding the mentioned discussion, I checked my history and also deleted the journal as a last step. So this could be related to that.
Do you get the same error if you delete the journal?
I have now deleted the symlink and recreated the (empty) /var/snap/lxd/common/lxd/storage-pools/data-nvme directory. If I start the container I get this error:
lxc start mycontainer
Error: Failed to create mount directory "/var/snap/lxd/common/lxd/storage-pools/data-nvme/containers/mycontainer": mkdir /var/snap/lxd/common/lxd/storage-pools/data-nvme/containers/mycontainer: no such file or directory
When I try to move the container to reproduce the journaling problem, I get:
lxc move mycontainer mycontainer_tmp -s test
Error: Create instance from copy: Create instance volume from copy failed: [lstat /var/snap/lxd/common/lxd/storage-pools/data-nvme/containers/mycontainer/: no such file or directory Failed reading migration header: context canceled]
So lxd still tries to access the wrong path.
So I think we need to get back to the original error, before you started to manually change the storage pool path contents, as it gets quite complicated when taking into account the snap mount namespace.
What I'd like to see is you getting back to the original error Failed to change ACLs on /var/snap/lxd/common/lxd/storage-pools/data-nvme/containers/old_container/rootfs/var/log/journal
and then removing var/log/journal
from inside the instance and trying again.
Just to be clear, the path /var/snap/lxd/common/lxd/storage-pools/data-nvme/
is not incorrect, the storage pool driver will mount the source
of the pool into the standard pool location. So that path being used isn't the issue here.
Looking at the log I saved, that's what happened:
Just to be clear, the path
/var/snap/lxd/common/lxd/storage-pools/data-nvme/
is not incorrect, the storage pool driver will mount thesource
of the pool into the standard pool location. So that path being used isn't the issue here.
Ok, but if I delete the symlink and create an empty directory instead, lxd should mount the pool/container, right?
Yes but you need to do that inside the snap's mount namespace.
I restored the folder but launching it did not work. So I created the symlink again and moved the containers to a new storage pool (under /media/data-nvme/). This worked without problems and all containers are running again.
If the paths were not the problem and the permissions now seem to work (for whatever reason) the only question left is why the ACL problem occurred.
The post https://discuss.linuxcontainers.org/t/cannot-start-a-copied-container-failed-to-change-acls/7982/2 explains that this is caused by an "old bug". LXD 4.0.0 was released in April 2020, the post is from May 2020 and I am using LXD 4.0.x since end of 2020. So this could be a regression. Anyway, I can't reproduce the bug and I didn't save the permissions of the journal files so I can't do anything else.
So let's close this until the next one stumbles over it. thanks anyway
Required information
Details
config: core.https_address: '[::]' core.trust_password: true api_extensions: - storage_zfs_remove_snapshots - container_host_shutdown_timeout - container_stop_priority - container_syscall_filtering - auth_pki - container_last_used_at - etag - patch - usb_devices - https_allowed_credentials - image_compression_algorithm - directory_manipulation - container_cpu_time - storage_zfs_use_refquota - storage_lvm_mount_options - network - profile_usedby - container_push - container_exec_recording - certificate_update - container_exec_signal_handling - gpu_devices - container_image_properties - migration_progress - id_map - network_firewall_filtering - network_routes - storage - file_delete - file_append - network_dhcp_expiry - storage_lvm_vg_rename - storage_lvm_thinpool_rename - network_vlan - image_create_aliases - container_stateless_copy - container_only_migration - storage_zfs_clone_copy - unix_device_rename - storage_lvm_use_thinpool - storage_rsync_bwlimit - network_vxlan_interface - storage_btrfs_mount_options - entity_description - image_force_refresh - storage_lvm_lv_resizing - id_map_base - file_symlinks - container_push_target - network_vlan_physical - storage_images_delete - container_edit_metadata - container_snapshot_stateful_migration - storage_driver_ceph - storage_ceph_user_name - resource_limits - storage_volatile_initial_source - storage_ceph_force_osd_reuse - storage_block_filesystem_btrfs - resources - kernel_limits - storage_api_volume_rename - macaroon_authentication - network_sriov - console - restrict_devlxd - migration_pre_copy - infiniband - maas_network - devlxd_events - proxy - network_dhcp_gateway - file_get_symlink - network_leases - unix_device_hotplug - storage_api_local_volume_handling - operation_description - clustering - event_lifecycle - storage_api_remote_volume_handling - nvidia_runtime - container_mount_propagation - container_backup - devlxd_images - container_local_cross_pool_handling - proxy_unix - proxy_udp - clustering_join - proxy_tcp_udp_multi_port_handling - network_state - proxy_unix_dac_properties - container_protection_delete - unix_priv_drop - pprof_http - proxy_haproxy_protocol - network_hwaddr - proxy_nat - network_nat_order - container_full - candid_authentication - backup_compression - candid_config - nvidia_runtime_config - storage_api_volume_snapshots - storage_unmapped - projects - candid_config_key - network_vxlan_ttl - container_incremental_copy - usb_optional_vendorid - snapshot_scheduling - snapshot_schedule_aliases - container_copy_project - clustering_server_address - clustering_image_replication - container_protection_shift - snapshot_expiry - container_backup_override_pool - snapshot_expiry_creation - network_leases_location - resources_cpu_socket - resources_gpu - resources_numa - kernel_features - id_map_current - event_location - storage_api_remote_volume_snapshots - network_nat_address - container_nic_routes - rbac - cluster_internal_copy - seccomp_notify - lxc_features - container_nic_ipvlan - network_vlan_sriov - storage_cephfs - container_nic_ipfilter - resources_v2 - container_exec_user_group_cwd - container_syscall_intercept - container_disk_shift - storage_shifted - resources_infiniband - daemon_storage - instances - image_types - resources_disk_sata - clustering_roles - images_expiry - resources_network_firmware - backup_compression_algorithm - ceph_data_pool_name - container_syscall_intercept_mount - compression_squashfs - container_raw_mount - container_nic_routed - container_syscall_intercept_mount_fuse - container_disk_ceph - virtual-machines - image_profiles - clustering_architecture - resources_disk_id - storage_lvm_stripes - vm_boot_priority - unix_hotplug_devices - api_filtering - instance_nic_network - clustering_sizing - firewall_driver - projects_limits - container_syscall_intercept_hugetlbfs - limits_hugepages - container_nic_routed_gateway - projects_restrictions - custom_volume_snapshot_expiry - volume_snapshot_scheduling - trust_ca_certificates - snapshot_disk_usage - clustering_edit_roles - container_nic_routed_host_address - container_nic_ipvlan_gateway - resources_usb_pci - resources_cpu_threads_numa - resources_cpu_core_die - api_os - resources_system - usedby_consistency - resources_gpu_mdev - console_vga_type - projects_limits_disk - storage_rsync_compression - gpu_mdev - resources_pci_iommu - resources_network_usb - resources_disk_address - network_state_vlan - gpu_sriov - migration_stateful - disk_state_quota - storage_ceph_features - gpu_mig - clustering_join_token - clustering_description - server_trusted_proxy - clustering_update_cert - storage_api_project - server_instance_driver_operational - server_supported_storage_drivers - event_lifecycle_requestor_address - resources_gpu_usb - network_counters_errors_dropped - image_source_project - database_leader - instance_all_projects - ceph_rbd_du - qemu_metrics - gpu_mig_uuid - event_project - instance_allow_inconsistent_copy - image_restrictions api_status: stable api_version: "1.0" auth: trusted public: false auth_methods: - tls environment: addresses: - 192.168.1.3:8443 - 192.168.111.1:8443 - 192.168.122.1:8443 - 10.124.232.1:8443 architectures: - x86_64 - i686 certificate: | -----BEGIN CERTIFICATE----- -----END CERTIFICATE----- certificate_fingerprint: f30e... driver: lxc | qemu driver_version: 4.0.12 | 6.1.1 firewall: xtables kernel: Linux kernel_architecture: x86_64 kernel_features: netnsid_getifaddrs: "true" seccomp_listener: "true" seccomp_listener_continue: "true" shiftfs: "false" uevent_injection: "true" unpriv_fscaps: "true" kernel_version: 5.13.0-30-generic lxc_features: cgroup2: "true" core_scheduling: "true" devpts_fd: "true" idmapped_mounts_v2: "true" mount_injection_file: "true" network_gateway_device_route: "true" network_ipvlan: "true" network_l2proxy: "true" network_phys_macvlan_mtu: "true" network_veth_router: "true" pidfd: "true" seccomp_allow_deny_syntax: "true" seccomp_notify: "true" seccomp_proxy_send_notify_fd: "true" os_name: Ubuntu os_version: "20.04" project: default server: lxd server_clustered: false server_name: production server_pid: 2475 server_version: 4.0.9 storage: btrfs storage_version: 5.4.1 storage_supported_drivers: - name: btrfs version: 5.4.1 remote: false - name: cephfs version: 15.2.14 remote: true - name: dir version: "1" remote: false - name: lvm version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.45.0 remote: false - name: zfs version: 2.0.6-1ubuntu2 remote: false - name: ceph version: 15.2.14 remote: trueIssue description
A storage pool got full, so I created a new one on another disk formatted as btrfs. So there are two storage pools on two different BTRFS file systems.
lxc storage create data-nvme btrfs source=/media/data-nvme/lxd/ lxc stop old_container lxc move old_container tmp_container -s data-nvme lxc move tmp_container old_container lxc start old_container
That worked fine, but then I noticed that one of the containers that shares a directory with the host no longer has access permissions.
I compared the configuration with the other containers and found that the container had the "raw.idmap" configuration set.
After the migration, the setting was "raw.idmap: both 100000 100000" (and I'm not sure if lxd changed this during the migration). The shared directory is actually another disk that was mounted as read-only, so this has not changed.
So I tried changing "raw.idmap" to "both 1000 1000" to match the host/container permissions.
Now the container refuses to start with the error message:
Error: Failed to handle idmapped storage: invalid argument - Failed to change ACLs on /var/snap/lxd/common/lxd/storage-pools/data-nvme/containers/old_container/rootfs/var/log/journal
The problem is that the storage pool is located at /media/data-nvme/lxd/, but lxd tries to access /var/snap/lxd/common/lxd/storage-pools/data-nvme/, which exists but is empty.
So I removed the empty directory and replaced it with a symlink to /media/data-nvme/lxd/, which fixed the problem.
I suspect the problem is:
Steps to reproduce
See above.
Information to attach