canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 925 forks source link

chgrp fails in lvm storage containers on Ubuntu 22.04 #10537

Closed hirose31 closed 2 years ago

hirose31 commented 2 years ago

Required information

lxc info … ``` config: {} api_extensions: - storage_zfs_remove_snapshots - container_host_shutdown_timeout - container_stop_priority - container_syscall_filtering - auth_pki - container_last_used_at - etag - patch - usb_devices - https_allowed_credentials - image_compression_algorithm - directory_manipulation - container_cpu_time - storage_zfs_use_refquota - storage_lvm_mount_options - network - profile_usedby - container_push - container_exec_recording - certificate_update - container_exec_signal_handling - gpu_devices - container_image_properties - migration_progress - id_map - network_firewall_filtering - network_routes - storage - file_delete - file_append - network_dhcp_expiry - storage_lvm_vg_rename - storage_lvm_thinpool_rename - network_vlan - image_create_aliases - container_stateless_copy - container_only_migration - storage_zfs_clone_copy - unix_device_rename - storage_lvm_use_thinpool - storage_rsync_bwlimit - network_vxlan_interface - storage_btrfs_mount_options - entity_description - image_force_refresh - storage_lvm_lv_resizing - id_map_base - file_symlinks - container_push_target - network_vlan_physical - storage_images_delete - container_edit_metadata - container_snapshot_stateful_migration - storage_driver_ceph - storage_ceph_user_name - resource_limits - storage_volatile_initial_source - storage_ceph_force_osd_reuse - storage_block_filesystem_btrfs - resources - kernel_limits - storage_api_volume_rename - macaroon_authentication - network_sriov - console - restrict_devlxd - migration_pre_copy - infiniband - maas_network - devlxd_events - proxy - network_dhcp_gateway - file_get_symlink - network_leases - unix_device_hotplug - storage_api_local_volume_handling - operation_description - clustering - event_lifecycle - storage_api_remote_volume_handling - nvidia_runtime - container_mount_propagation - container_backup - devlxd_images - container_local_cross_pool_handling - proxy_unix - proxy_udp - clustering_join - proxy_tcp_udp_multi_port_handling - network_state - proxy_unix_dac_properties - container_protection_delete - unix_priv_drop - pprof_http - proxy_haproxy_protocol - network_hwaddr - proxy_nat - network_nat_order - container_full - candid_authentication - backup_compression - candid_config - nvidia_runtime_config - storage_api_volume_snapshots - storage_unmapped - projects - candid_config_key - network_vxlan_ttl - container_incremental_copy - usb_optional_vendorid - snapshot_scheduling - snapshot_schedule_aliases - container_copy_project - clustering_server_address - clustering_image_replication - container_protection_shift - snapshot_expiry - container_backup_override_pool - snapshot_expiry_creation - network_leases_location - resources_cpu_socket - resources_gpu - resources_numa - kernel_features - id_map_current - event_location - storage_api_remote_volume_snapshots - network_nat_address - container_nic_routes - rbac - cluster_internal_copy - seccomp_notify - lxc_features - container_nic_ipvlan - network_vlan_sriov - storage_cephfs - container_nic_ipfilter - resources_v2 - container_exec_user_group_cwd - container_syscall_intercept - container_disk_shift - storage_shifted - resources_infiniband - daemon_storage - instances - image_types - resources_disk_sata - clustering_roles - images_expiry - resources_network_firmware - backup_compression_algorithm - ceph_data_pool_name - container_syscall_intercept_mount - compression_squashfs - container_raw_mount - container_nic_routed - container_syscall_intercept_mount_fuse - container_disk_ceph - virtual-machines - image_profiles - clustering_architecture - resources_disk_id - storage_lvm_stripes - vm_boot_priority - unix_hotplug_devices - api_filtering - instance_nic_network - clustering_sizing - firewall_driver - projects_limits - container_syscall_intercept_hugetlbfs - limits_hugepages - container_nic_routed_gateway - projects_restrictions - custom_volume_snapshot_expiry - volume_snapshot_scheduling - trust_ca_certificates - snapshot_disk_usage - clustering_edit_roles - container_nic_routed_host_address - container_nic_ipvlan_gateway - resources_usb_pci - resources_cpu_threads_numa - resources_cpu_core_die - api_os - container_nic_routed_host_table - container_nic_ipvlan_host_table - container_nic_ipvlan_mode - resources_system - images_push_relay - network_dns_search - container_nic_routed_limits - instance_nic_bridged_vlan - network_state_bond_bridge - usedby_consistency - custom_block_volumes - clustering_failure_domains - resources_gpu_mdev - console_vga_type - projects_limits_disk - network_type_macvlan - network_type_sriov - container_syscall_intercept_bpf_devices - network_type_ovn - projects_networks - projects_networks_restricted_uplinks - custom_volume_backup - backup_override_name - storage_rsync_compression - network_type_physical - network_ovn_external_subnets - network_ovn_nat - network_ovn_external_routes_remove - tpm_device_type - storage_zfs_clone_copy_rebase - gpu_mdev - resources_pci_iommu - resources_network_usb - resources_disk_address - network_physical_ovn_ingress_mode - network_ovn_dhcp - network_physical_routes_anycast - projects_limits_instances - network_state_vlan - instance_nic_bridged_port_isolation - instance_bulk_state_change - network_gvrp - instance_pool_move - gpu_sriov - pci_device_type - storage_volume_state - network_acl - migration_stateful - disk_state_quota - storage_ceph_features - projects_compression - projects_images_remote_cache_expiry - certificate_project - network_ovn_acl - projects_images_auto_update - projects_restricted_cluster_target - images_default_architecture - network_ovn_acl_defaults - gpu_mig - project_usage - network_bridge_acl - warnings - projects_restricted_backups_and_snapshots - clustering_join_token - clustering_description - server_trusted_proxy - clustering_update_cert - storage_api_project - server_instance_driver_operational - server_supported_storage_drivers - event_lifecycle_requestor_address - resources_gpu_usb - clustering_evacuation - network_ovn_nat_address - network_bgp - network_forward - custom_volume_refresh - network_counters_errors_dropped - metrics - image_source_project - clustering_config - network_peer - linux_sysctl - network_dns - ovn_nic_acceleration - certificate_self_renewal - instance_project_move - storage_volume_project_move - cloud_init - network_dns_nat - database_leader - instance_all_projects - clustering_groups - ceph_rbd_du - instance_get_full - qemu_metrics - gpu_mig_uuid - event_project - clustering_evacuation_live - instance_allow_inconsistent_copy - network_state_ovn - storage_volume_api_filtering - image_restrictions - storage_zfs_export - network_dns_records - storage_zfs_reserve_space - network_acl_log - storage_zfs_blocksize - metrics_cpu_seconds - instance_snapshot_never - certificate_token - instance_nic_routed_neighbor_probe - event_hub - agent_nic_config - projects_restricted_intercept - metrics_authentication - images_target_project - cluster_migration_inconsistent_copy - cluster_ovn_chassis - container_syscall_intercept_sched_setscheduler - storage_lvm_thinpool_metadata_size api_status: stable api_version: "1.0" auth: trusted public: false auth_methods: - tls environment: addresses: [] architectures: - x86_64 - i686 certificate: | -----BEGIN CERTIFICATE----- (snip) -----END CERTIFICATE----- certificate_fingerprint: (snip) driver: lxc driver_version: 4.0.12 firewall: nftables kernel: Linux kernel_architecture: x86_64 kernel_features: idmapped_mounts: "true" netnsid_getifaddrs: "true" seccomp_listener: "true" seccomp_listener_continue: "true" shiftfs: "false" uevent_injection: "true" unpriv_fscaps: "true" kernel_version: 5.15.0-1009-aws lxc_features: cgroup2: "true" core_scheduling: "true" devpts_fd: "true" idmapped_mounts_v2: "true" mount_injection_file: "true" network_gateway_device_route: "true" network_ipvlan: "true" network_l2proxy: "true" network_phys_macvlan_mtu: "true" network_veth_router: "true" pidfd: "true" seccomp_allow_deny_syntax: "true" seccomp_notify: "true" seccomp_proxy_send_notify_fd: "true" os_name: Ubuntu os_version: "22.04" project: default server: lxd server_clustered: false server_event_mode: full-mesh server_name: ip-10-225-66-81 server_pid: 16374 server_version: 5.0.0 storage: lvm | zfs storage_version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.45.0 | 2.1.2-1ubuntu3 storage_supported_drivers: - name: btrfs version: 5.4.1 remote: false - name: cephfs version: 15.2.14 remote: true - name: dir version: "1" remote: false - name: lvm version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.45.0 remote: false - name: zfs version: 2.1.2-1ubuntu3 remote: false - name: ceph version: 15.2.14 remote: true ```

Issue description

If the storage is lvm, chgrp fails to run as a non-root user in a container.

Steps to reproduce

lxc storage create pool-lvm lvm lvm.use_thinpool=true
lxc launch ubuntu/22.04 lvm-u22 --storage pool-lvm
lxc exec lvm-u22 -- bash

### in container
id -a ubuntu
=> uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu),27(sudo)

mkdir ~ubuntu/d1
chown ubuntu:sudo ~ubuntu/d1
sudo -u ubuntu chgrp ubuntu ~ubuntu/d1
=> chgrp: changing group of '/home/ubuntu/d1': Operation not permitted

ls -ld ~ubuntu/d1
=> drwxr-xr-x 2 ubuntu sudo 4096 Jun 10 10:40 /home/ubuntu/d1
stgraber commented 2 years ago

This was tracked down to a kernel bug in the VFS idmap feature.

@brauner is working on a kernel fix for this now which will be sent to the stable kernel list and hopefully make it to the various distros in an upcoming kernel bugfix update.

In the meantime, there are a few ways to workaround that, none super pleasant sadly:

Assuming you're using the snap, you can do the former through systemctl edit snap.lxd.daemon.service then putting an override like:

[Service]
Environment=LXD_IDMAPPED_MOUNTS_DISABLE=1

Then systemctl reload snap.lxd.daemon should hopefully pick it up.

stgraber commented 2 years ago

Closing the issue as there's nothing we can do in LXD itself as that's a kernel issue.

brauner commented 2 years ago

chown()ing as a privileged user will work ofc.

panlinux commented 2 years ago
  • Disable idmapped mounts by setting LXD_IDMAPPED_MOUNTS_DISABLE in the LXD process environment

What are the consequences of the above?

panlinux commented 2 years ago

For what is worth, my storage is btrfs on lvm, and I see the same issue. Setting LXD_IDMAPPED_MOUNTS_DISABLE=1 also works around it.

$ lxc storage show default
config:
  source: /dev/vgubuntu/lxd
  volatile.initial_source: /dev/vgubuntu/lxd
description: ""
name: default
driver: btrfs
used_by:
- /1.0/images/e3e1bd82cdc7fa1256cf2409dd8543630eefa1fca631ff0c78c0970babddc69f
- /1.0/images/e9589b6e9c886888b3df98aee0f0e16c5805383418b3563cd8845220f43b40ff
- /1.0/instances/f1
- /1.0/instances/f2
- /1.0/profiles/default
status: Created
locations:
- none
brauner commented 2 years ago

This is fixed upstream and should be backported to all LTS stable kernels.

brauner commented 2 years ago
  • Disable idmapped mounts by setting LXD_IDMAPPED_MOUNTS_DISABLE in the LXD process environment

What are the consequences of the above?

LXD will fallback to recursive chown or - if available - shiftfs.