canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.39k stars 932 forks source link

AppArmor profile is not created for initialized but not started containers #8909

Closed morphis closed 3 years ago

morphis commented 3 years ago

Required information

config:
  cluster.https_address: 192.168.1.59:8443
  core.https_address: '[::]'
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses:
   ....
  architectures:
  - x86_64
  - i686
  certificate: |
    ....
  certificate_fingerprint: xxx
  driver: lxc | qemu
  driver_version: 4.0.9 | 5.2.0
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "true"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.8.0-55-generic
  lxc_features:
    cgroup2: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "false"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "20.04"
  project: default
  server: lxd
  server_clustered: true
  server_name: titan
  server_pid: 32280
  server_version: "4.15"
  storage: zfs
  storage_version: 0.8.4-1ubuntu11.2

This is on an amd64 system which got a snap refresh this morning to installed: 4.15 (20760) 69MB -

Issue description

When initializing a container (but not starting it) setting security.nesting fails with an apparmor_parser error failing to find the profile for the container.

Starting the container after lxc init and before lxc config set generates the profile and allows the config item to be set.

Steps to reproduce

$ lxc init ubuntu:f c0
$ lxc config set c0 security.nesting true
Error: Parse AppArmor profile: Failed to run: apparmor_parser -QWL /var/snap/lxd/common/lxd/security/apparmor/cache /var/snap/lxd/common/lxd/security/apparmor/profiles/lxd-c0: File /var/snap/lxd/common/lxd/security/apparmor/profiles/lxd-c0 not found, skipping...

Information to attach

tomponline commented 3 years ago

Confirmed reproduced.

tomponline commented 3 years ago

Ah so the apparmor profile is only generated on start, so if its never been started it wont be there.

I'm not sure why we only generated the profile on start though... @stgraber do you know?

tomponline commented 3 years ago

@stgraber interesting apparmor.InstanceParse(d.state, d) is only called in one place, from lxc.Update() and not in qemu.Update() so it seems like we could do with standardising the apparmor profile management approach across instance types too.

The lxc.Update() command will call apparmor.InstanceLoad(d.state, d) if the container is running and the security.nesting or raw.apparmor settings have changed, so maybe that will suffice and we don't need to do InstanceParse as well earlier in the same function.

stgraber commented 3 years ago

We only need the profile on start so we don't spend the time generating it before that. The binary profile itself may differ based on apparmor or kernel version so doing it at creation time may end up being a waste of time.

I'm a bit confused as to how this only started being a problem now though.

tomponline commented 3 years ago

I'm pretty sure that the check with apparmor.InstanceParse(d.state, d) isn't doing anything anyway, as it will be checking the old apparmor profile (if it exists) not the new one, so its not doing validation of the modified raw.apparmor setting anyway.

tomponline commented 3 years ago

@stgraber perhaps nobody has initialised and not started a container and then enabled nesting via an update before first boot.

After first boot, the profile will exist, so will parse validation (even though its not actually validating future changes).

If it was in the profile it would be fine, as that wouldn't cause a pre-first-start update.

stgraber commented 3 years ago

So that are a few things with update:

tomponline commented 3 years ago

@stgraber the apparmor.InstanceLoad(d.state, d) function seems to rebuild the apparmor profile from the instance config, so I assume that is what we need to be calling when the raw.apparmor or security.nesting settings are changed. And I presume this will also end up validating the profile as a side effect so we don't need to call apparmor.InstanceParse(d.state, d) at all.

However there's a warning sign for me here in that currently for containers the apparmor.InstanceLoad(d.state, d) function is only called when the container has been started (its called in onStart() hook and during Update() but only when the container is running), so I am worried there's some uncommented knowledge wrapped up in there that there's some reason we cannot load the apparmor profile when the container isn't running.

stgraber commented 3 years ago

Well, it's more that we shouldn't load it if not running.

LXD loads the profiles in the kernel on start and unloads on stop.

We should rely on Parse for validation and only do Load if we need the running profile updated in the kernel.

tomponline commented 3 years ago

@stgraber yep makes sense, thats basically how it is now.

The problem that I can see is that InstanceParse doesn't appear to actually generate the profile file, it just assumes it exists and tries to parse it. It seems to rely on the InstanceLoad having been previously run to generate the actual profile file.

Perhaps we need to get InstanceParse to call instanceProfile internally (this is what is called from InstanceLoad and returns the actual profile file) and then write that file out if it has changed from current, so that is can be parsed by runApparmor.

So basically this bit: https://github.com/lxc/lxd/blob/master/lxd/apparmor/instance.go#L71-L87 Followed by: https://github.com/lxc/lxd/blob/master/lxd/apparmor/instance.go#L117

stgraber commented 3 years ago

Ah yeah, we definitely need Parse to generate the profile or if that's too weird, have a separate Validate which does that.