canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 926 forks source link

Error: Common start logic: Failed to run: zfs mount #7119

Closed itoffshore closed 2 years ago

itoffshore commented 4 years ago

Required information

config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIB/TCCAYOgAwIBAgIQRFDWxh1ympNR6ns4gTBHGDAKBggqhkjOPQQDAzAyMRww
    GgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRIwEAYDVQQDDAlyb290QHR1YmEw
    HhcNMjAwMjI3MTI1NzEwWhcNMzAwMjI0MTI1NzEwWjAyMRwwGgYDVQQKExNsaW51
    eGNvbnRhaW5lcnMub3JnMRIwEAYDVQQDDAlyb290QHR1YmEwdjAQBgcqhkjOPQIB
    BgUrgQQAIgNiAAQufti2/O2Syuc3DQNi47OT6VQhL/uU7tJLjGP2KrjFV2HxNQfS
    3iwznnr5RaTKhTAjwSxFO2iIzen+5smMwKtelTByi39AcZzbB+jXLvPuizi/wrmh
    JP/VuYTs+48pz5ejXjBcMA4GA1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggrBgEF
    BQcDATAMBgNVHRMBAf8EAjAAMCcGA1UdEQQgMB6CBHR1YmGHBH8AAAGHEAAAAAAA
    AAAAAAAAAAAAAAEwCgYIKoZIzj0EAwMDaAAwZQIxAJfr0VVrMRVFeuvvg/SI0hU2
    gYxqn7/WEHdLj37vaS/+ZucXa9XDCW40vkj1piHIWAIwUwFV9c3vPxvskyKUePlV
    xbv3PwCnxcoTm1olmqyWwUn+ZTvPJij5FKaFMjBpjf08
    -----END CERTIFICATE-----
  certificate_fingerprint: 012939ae00e7f82496a6b98d8c8d21e28ff304195f6174050e906d22441ebf37
  driver: lxc
  driver_version: 4.0.0
  firewall: xtables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "false"
    seccomp_listener: "false"
    seccomp_listener_continue: "false"
    shiftfs: "false"
    uevent_injection: "false"
    unpriv_fscaps: "true"
  kernel_version: 4.15.0-88-generic
  lxc_features:
    cgroup2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    seccomp_notify: "true"
  project: default
  server: lxd
  server_clustered: false
  server_name: tuba
  server_pid: 9199
  server_version: "3.23"
  storage: zfs
  storage_version: 0.7.5-1ubuntu16.7

Issue description

This is the same issue as on the LXD forums here & this githib issue concerning zfs not being namespace aware.

hopefully this info helps some others:

Opening this issue to note that this problem on zfs seems to NOT occur if you restart a container from a root console inside the container after adding a bind mount

Steps to reproduce

  1. After adding a 3rd bind mount to a container with:

    lxc config device add continer-name share-name disk source=/zpool/some/dataset path=/home/username

  2. & then rebooting the container with lxc restart the container fails to restart:

    Error: Common start logic: Failed to run: zfs mount zpool/lxd/containers/container-name: cannot mount 'zpool/lxd/containers/container-name': filesystem already mounted

  3. after adding the 2 previous bind mounts I rebooted the container from inside the container with a reboot as root (which worked ok).

Information to attach

architecture: x86_64
config:
  image.architecture: amd64
  image.description: Centos 6 amd64 (20200327_07:08)
  image.os: Centos
  image.release: "6"
  image.serial: "20200327_07:08"
  image.type: squashfs
  raw.idmap: uid 10376 10376
  volatile.base_image: cdc3a4ac4cfba8998ba6cdb0e29c14bf5b4c76bc7a1f8427e3d8cf3f696f498e
  volatile.eth0.host_name: veth2cfcdc78
  volatile.eth0.hwaddr: 00:16:3e:8d:e4:95
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":10376},{"Isuid":true,"Isgid":false,"Hostid":10376,"Nsid":10376,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1010377,"Nsid":10377,"Maprange":999989623},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":10376},{"Isuid":true,"Isgid":false,"Hostid":10376,"Nsid":10376,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1010377,"Nsid":10377,"Maprange":999989623},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":10376},{"Isuid":true,"Isgid":false,"Hostid":10376,"Nsid":10376,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1010377,"Nsid":10377,"Maprange":999989623},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: STOPPED
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br0
    type: nic
  root:
    path: /
    pool: pool1
    type: disk
  share-username-home:
    path: /home/username
    source: /zpool/username/home
    type: disk
  share-test:
    path: /test
    source: /zpool/username/test
    type: disk
  share-test-clients:
    path: /test-clients
    source: /zpool/username/test-clients
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
DBUG[04-01|15:34:07] Connecting to a local LXD over a Unix socket 
DBUG[04-01|15:34:07] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=
DBUG[04-01|15:34:07] Got response struct from LXD 
DBUG[04-01|15:34:07] 
    {
        "config": {},
        "api_extensions": [
            "storage_zfs_remove_snapshots",
            "container_host_shutdown_timeout",
            "container_stop_priority",
            "container_syscall_filtering",
            "auth_pki",
            "container_last_used_at",
            "etag",
            "patch",
            "usb_devices",
            "https_allowed_credentials",
            "image_compression_algorithm",
            "directory_manipulation",
            "container_cpu_time",
            "storage_zfs_use_refquota",
            "storage_lvm_mount_options",
            "network",
            "profile_usedby",
            "container_push",
            "container_exec_recording",
            "certificate_update",
            "container_exec_signal_handling",
            "gpu_devices",
            "container_image_properties",
            "migration_progress",
            "id_map",
            "network_firewall_filtering",
            "network_routes",
            "storage",
            "file_delete",
            "file_append",
            "network_dhcp_expiry",
            "storage_lvm_vg_rename",
            "storage_lvm_thinpool_rename",
            "network_vlan",
            "image_create_aliases",
            "container_stateless_copy",
            "container_only_migration",
            "storage_zfs_clone_copy",
            "unix_device_rename",
            "storage_lvm_use_thinpool",
            "storage_rsync_bwlimit",
            "network_vxlan_interface",
            "storage_btrfs_mount_options",
            "entity_description",
            "image_force_refresh",
            "storage_lvm_lv_resizing",
            "id_map_base",
            "file_symlinks",
            "container_push_target",
            "network_vlan_physical",
            "storage_images_delete",
            "container_edit_metadata",
            "container_snapshot_stateful_migration",
            "storage_driver_ceph",
            "storage_ceph_user_name",
            "resource_limits",
            "storage_volatile_initial_source",
            "storage_ceph_force_osd_reuse",
            "storage_block_filesystem_btrfs",
            "resources",
            "kernel_limits",
            "storage_api_volume_rename",
            "macaroon_authentication",
            "network_sriov",
            "console",
            "restrict_devlxd",
            "migration_pre_copy",
            "infiniband",
            "maas_network",
            "devlxd_events",
            "proxy",
            "network_dhcp_gateway",
            "file_get_symlink",
            "network_leases",
            "unix_device_hotplug",
            "storage_api_local_volume_handling",
            "operation_description",
            "clustering",
            "event_lifecycle",
            "storage_api_remote_volume_handling",
            "nvidia_runtime",
            "container_mount_propagation",
            "container_backup",
            "devlxd_images",
            "container_local_cross_pool_handling",
            "proxy_unix",
            "proxy_udp",
            "clustering_join",
            "proxy_tcp_udp_multi_port_handling",
            "network_state",
            "proxy_unix_dac_properties",
            "container_protection_delete",
            "unix_priv_drop",
            "pprof_http",
            "proxy_haproxy_protocol",
            "network_hwaddr",
            "proxy_nat",
            "network_nat_order",
            "container_full",
            "candid_authentication",
            "backup_compression",
            "candid_config",
            "nvidia_runtime_config",
            "storage_api_volume_snapshots",
            "storage_unmapped",
            "projects",
            "candid_config_key",
            "network_vxlan_ttl",
            "container_incremental_copy",
            "usb_optional_vendorid",
            "snapshot_scheduling",
            "container_copy_project",
            "clustering_server_address",
            "clustering_image_replication",
            "container_protection_shift",
            "snapshot_expiry",
            "container_backup_override_pool",
            "snapshot_expiry_creation",
            "network_leases_location",
            "resources_cpu_socket",
            "resources_gpu",
            "resources_numa",
            "kernel_features",
            "id_map_current",
            "event_location",
            "storage_api_remote_volume_snapshots",
            "network_nat_address",
            "container_nic_routes",
            "rbac",
            "cluster_internal_copy",
            "seccomp_notify",
            "lxc_features",
            "container_nic_ipvlan",
            "network_vlan_sriov",
            "storage_cephfs",
            "container_nic_ipfilter",
            "resources_v2",
            "container_exec_user_group_cwd",
            "container_syscall_intercept",
            "container_disk_shift",
            "storage_shifted",
            "resources_infiniband",
            "daemon_storage",
            "instances",
            "image_types",
            "resources_disk_sata",
            "clustering_roles",
            "images_expiry",
            "resources_network_firmware",
            "backup_compression_algorithm",
            "ceph_data_pool_name",
            "container_syscall_intercept_mount",
            "compression_squashfs",
            "container_raw_mount",
            "container_nic_routed",
            "container_syscall_intercept_mount_fuse",
            "container_disk_ceph",
            "virtual-machines",
            "image_profiles",
            "clustering_architecture",
            "resources_disk_id",
            "storage_lvm_stripes",
            "vm_boot_priority",
            "unix_hotplug_devices",
            "api_filtering",
            "instance_nic_network",
            "clustering_sizing",
            "firewall_driver",
            "projects_limits",
            "container_syscall_intercept_hugetlbfs",
            "limits_hugepages",
            "container_nic_routed_gateway",
            "projects_restrictions",
            "custom_volume_snapshot_expiry",
            "volume_snapshot_scheduling"
        ],
        "api_status": "stable",
        "api_version": "1.0",
        "auth": "trusted",
        "public": false,
        "auth_methods": [
            "tls"
        ],
        "environment": {
            "addresses": [],
            "architectures": [
                "x86_64",
                "i686"
            ],
            "certificate": "-----BEGIN CERTIFICATE-----\nMIIB/TCCAYOgAwIBAgIQRFDWxh1ympNR6ns4gTBHGDAKBggqhkjOPQQDAzAyMRww\nGgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRIwEAYDVQQDDAlyb290QHR1YmEw\nHhcNMjAwMjI3MTI1NzEwWhcNMzAwMjI0MTI1NzEwWjAyMRwwGgYDVQQKExNsaW51\neGNvbnRhaW5lcnMub3JnMRIwEAYDVQQDDAlyb290QHR1YmEwdjAQBgcqhkjOPQIB\nBgUrgQQAIgNiAAQufti2/O2Syuc3DQNi47OT6VQhL/uU7tJLjGP2KrjFV2HxNQfS\n3iwznnr5RaTKhTAjwSxFO2iIzen+5smMwKtelTByi39AcZzbB+jXLvPuizi/wrmh\nJP/VuYTs+48pz5ejXjBcMA4GA1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggrBgEF\nBQcDATAMBgNVHRMBAf8EAjAAMCcGA1UdEQQgMB6CBHR1YmGHBH8AAAGHEAAAAAAA\nAAAAAAAAAAAAAAEwCgYIKoZIzj0EAwMDaAAwZQIxAJfr0VVrMRVFeuvvg/SI0hU2\ngYxqn7/WEHdLj37vaS/+ZucXa9XDCW40vkj1piHIWAIwUwFV9c3vPxvskyKUePlV\nxbv3PwCnxcoTm1olmqyWwUn+ZTvPJij5FKaFMjBpjf08\n-----END CERTIFICATE-----\n",
            "certificate_fingerprint": "012939ae00e7f82496a6b98d8c8d21e28ff304195f6174050e906d22441ebf37",
            "driver": "lxc",
            "driver_version": "4.0.0",
            "firewall": "xtables",
            "kernel": "Linux",
            "kernel_architecture": "x86_64",
            "kernel_features": {
                "netnsid_getifaddrs": "false",
                "seccomp_listener": "false",
                "seccomp_listener_continue": "false",
                "shiftfs": "false",
                "uevent_injection": "false",
                "unpriv_fscaps": "true"
            },
            "kernel_version": "4.15.0-88-generic",
            "lxc_features": {
                "cgroup2": "true",
                "mount_injection_file": "true",
                "network_gateway_device_route": "true",
                "network_ipvlan": "true",
                "network_l2proxy": "true",
                "network_phys_macvlan_mtu": "true",
                "network_veth_router": "true",
                "seccomp_notify": "true"
            },
            "project": "default",
            "server": "lxd",
            "server_clustered": false,
            "server_name": "tuba",
            "server_pid": 9199,
            "server_version": "3.23",
            "storage": "zfs",
            "storage_version": "0.7.5-1ubuntu16.7"
        }
    } 
DBUG[04-01|15:34:07] Sending request to LXD                   method=GET url=http://unix.socket/1.0/instances/centos6-builder etag=
DBUG[04-01|15:34:07] Got response struct from LXD 
DBUG[04-01|15:34:07] 
    {
        "architecture": "x86_64",
        "config": {
            "image.architecture": "amd64",
            "image.description": "Centos 6 amd64 (20200327_07:08)",
            "image.os": "Centos",
            "image.release": "6",
            "image.serial": "20200327_07:08",
            "image.type": "squashfs",
            "raw.idmap": "uid 10376 10376",
            "volatile.base_image": "cdc3a4ac4cfba8998ba6cdb0e29c14bf5b4c76bc7a1f8427e3d8cf3f696f498e",
            "volatile.eth0.host_name": "veth12526d52",
            "volatile.eth0.hwaddr": "00:16:3e:8d:e4:95",
            "volatile.idmap.base": "0",
            "volatile.idmap.current": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":10376},{\"Isuid\":true,\"Isgid\":false,\"Hostid\":10376,\"Nsid\":10376,\"Maprange\":1},{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1010377,\"Nsid\":10377,\"Maprange\":999989623},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
            "volatile.idmap.next": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":10376},{\"Isuid\":true,\"Isgid\":false,\"Hostid\":10376,\"Nsid\":10376,\"Maprange\":1},{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1010377,\"Nsid\":10377,\"Maprange\":999989623},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
            "volatile.last_state.idmap": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":10376},{\"Isuid\":true,\"Isgid\":false,\"Hostid\":10376,\"Nsid\":10376,\"Maprange\":1},{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1010377,\"Nsid\":10377,\"Maprange\":999989623},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
            "volatile.last_state.power": "STOPPED"
        },
        "devices": {
            "share-username-home": {
                "path": "/home/username",
                "source": "/zpool/username/home",
                "type": "disk"
            },
            "share-test": {
                "path": "/test",
                "source": "/zpool/username/test",
                "type": "disk"
            },
            "share-test-clients": {
                "path": "/test-clients",
                "source": "/zpool/username/test-clients",
                "type": "disk"
            }
        },
        "ephemeral": false,
        "profiles": [
            "default"
        ],
        "stateful": false,
        "description": "",
        "created_at": "2020-03-27T11:54:45.557837473Z",
        "expanded_config": {
            "image.architecture": "amd64",
            "image.description": "Centos 6 amd64 (20200327_07:08)",
            "image.os": "Centos",
            "image.release": "6",
            "image.serial": "20200327_07:08",
            "image.type": "squashfs",
            "raw.idmap": "uid 10376 10376",
            "volatile.base_image": "cdc3a4ac4cfba8998ba6cdb0e29c14bf5b4c76bc7a1f8427e3d8cf3f696f498e",
            "volatile.eth0.host_name": "veth12526d52",
            "volatile.eth0.hwaddr": "00:16:3e:8d:e4:95",
            "volatile.idmap.base": "0",
            "volatile.idmap.current": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":10376},{\"Isuid\":true,\"Isgid\":false,\"Hostid\":10376,\"Nsid\":10376,\"Maprange\":1},{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1010377,\"Nsid\":10377,\"Maprange\":999989623},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
            "volatile.idmap.next": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":10376},{\"Isuid\":true,\"Isgid\":false,\"Hostid\":10376,\"Nsid\":10376,\"Maprange\":1},{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1010377,\"Nsid\":10377,\"Maprange\":999989623},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
            "volatile.last_state.idmap": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":10376},{\"Isuid\":true,\"Isgid\":false,\"Hostid\":10376,\"Nsid\":10376,\"Maprange\":1},{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1010377,\"Nsid\":10377,\"Maprange\":999989623},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]",
            "volatile.last_state.power": "STOPPED"
        },
        "expanded_devices": {
            "eth0": {
                "name": "eth0",
                "nictype": "bridged",
                "parent": "br0",
                "type": "nic"
            },
            "root": {
                "path": "/",
                "pool": "pool1",
                "type": "disk"
            },
            "share-username-home": {
                "path": "/home/username",
                "source": "/zpool/username/home",
                "type": "disk"
            },
            "share-test": {
                "path": "/test",
                "source": "/zpool/username/test",
                "type": "disk"
            },
            "share-test-clients": {
                "path": "/test-clients",
                "source": "/zpool/username/test-clients",
                "type": "disk"
            }
        },
        "name": "centos6-builder",
        "status": "Stopped",
        "status_code": 102,
        "last_used_at": "2020-03-27T17:45:26.108241343Z",
        "location": "none",
        "type": "container"
    } 
DBUG[04-01|15:34:07] Connected to the websocket: ws://unix.socket/1.0/events 
DBUG[04-01|15:34:07] Sending request to LXD                   method=PUT url=http://unix.socket/1.0/instances/centos6-builder/state etag=
DBUG[04-01|15:34:07] 
    {
        "action": "start",
        "timeout": 0,
        "force": false,
        "stateful": false
    } 
DBUG[04-01|15:34:07] Got operation from LXD 
DBUG[04-01|15:34:07] 
    {
        "id": "fb96dec8-fa14-4875-9b5a-e12bcbe51fb4",
        "class": "task",
        "description": "Starting container",
        "created_at": "2020-04-01T15:34:07.499043915Z",
        "updated_at": "2020-04-01T15:34:07.499043915Z",
        "status": "Running",
        "status_code": 103,
        "resources": {
            "containers": [
                "/1.0/containers/centos6-builder"
            ]
        },
        "metadata": null,
        "may_cancel": false,
        "err": "",
        "location": "none"
    } 
DBUG[04-01|15:34:07] Sending request to LXD                   method=GET url=http://unix.socket/1.0/operations/fb96dec8-fa14-4875-9b5a-e12bcbe51fb4 etag=
DBUG[04-01|15:34:07] Got response struct from LXD 
DBUG[04-01|15:34:07] 
    {
        "id": "fb96dec8-fa14-4875-9b5a-e12bcbe51fb4",
        "class": "task",
        "description": "Starting container",
        "created_at": "2020-04-01T15:34:07.499043915Z",
        "updated_at": "2020-04-01T15:34:07.499043915Z",
        "status": "Running",
        "status_code": 103,
        "resources": {
            "containers": [
                "/1.0/containers/centos6-builder"
            ]
        },
        "metadata": null,
        "may_cancel": false,
        "err": "",
        "location": "none"
    } 
Error: Common start logic: Failed to run: zfs mount zpool/lxd/containers/centos6-builder: cannot mount 'zpool/lxd/containers/centos6-builder': filesystem already mounted
Try `lxc info --show-log centos6-builder` for more info
stgraber commented 4 years ago

Ok, so to confirm, the steps are:

Correct?

itoffshore commented 4 years ago

I created 2 bind mounts of zfs datasets in a Centos 6 container - then rebooted it from a root console inside the container (as I was also changing uidmaps)

afterwards I created a 3rd bind mount of a zfs dataset but this time rebooted via lxc restart to save time logging into the container.

stgraber commented 4 years ago
stgraber@castiana:~$ lxc launch images:centos/6 c6
Creating c6
Starting c6                                   
stgraber@castiana:~$ lxc config device add c6 mnt disk source=/mnt path=/mnt/mnt
Device mnt added to c6
stgraber@castiana:~$ lxc config device add c6 srv disk source=/srv path=/mnt/srv
Device srv added to c6
stgraber@castiana:~$ lxc exec c6 bash
[root@c6 ~]# reboot
stgraber@castiana:~$ lxc config device add c6 opt disk source=/opt path=/mnt/opt
Device opt added to c6
stgraber@castiana:~$ lxc restart c6
stgraber@castiana:~$ lxc exec c6 -- grep /mnt /proc/mounts
castiana/ROOT/ubuntu /mnt/mnt zfs rw,relatime,xattr,posixacl 0 0
castiana/ROOT/ubuntu /mnt/opt zfs rw,relatime,xattr,posixacl 0 0
castiana/ROOT/ubuntu /mnt/srv zfs rw,relatime,xattr,posixacl 0 0
stgraber@castiana:~$ 
stgraber commented 4 years ago

Can you show zfs get all zpool/lxd/containers/centos6-builder

itoffshore commented 4 years ago

A reboot cleared the error & the container is working today:

NAME                                     PROPERTY              VALUE                                                                                                  SOURCE
zpool012/lxd/containers/centos6-builder  type                  filesystem                                                                                             -
zpool012/lxd/containers/centos6-builder  creation              Fri Mar 27 11:54 2020                                                                                  -
zpool012/lxd/containers/centos6-builder  used                  739M                                                                                                   -
zpool012/lxd/containers/centos6-builder  available             41.7T                                                                                                  -
zpool012/lxd/containers/centos6-builder  referenced            952M                                                                                                   -
zpool012/lxd/containers/centos6-builder  compressratio         2.34x                                                                                                  -
zpool012/lxd/containers/centos6-builder  mounted               no                                                                                                     -
zpool012/lxd/containers/centos6-builder  origin                zpool012/lxd/deleted/images/cdc3a4ac4cfba8998ba6cdb0e29c14bf5b4c76bc7a1f8427e3d8cf3f696f498e@readonly  -
zpool012/lxd/containers/centos6-builder  quota                 none                                                                                                   local
zpool012/lxd/containers/centos6-builder  reservation           none                                                                                                   default
zpool012/lxd/containers/centos6-builder  recordsize            128K                                                                                                   default
zpool012/lxd/containers/centos6-builder  mountpoint            /var/snap/lxd/common/lxd/storage-pools/pool1/containers/centos6-builder                                local
zpool012/lxd/containers/centos6-builder  sharenfs              off                                                                                                    default
zpool012/lxd/containers/centos6-builder  checksum              on                                                                                                     default
zpool012/lxd/containers/centos6-builder  compression           lz4                                                                                                    inherited from zpool012
zpool012/lxd/containers/centos6-builder  atime                 on                                                                                                     default
zpool012/lxd/containers/centos6-builder  devices               on                                                                                                     inherited from zpool012/lxd
zpool012/lxd/containers/centos6-builder  exec                  on                                                                                                     inherited from zpool012/lxd
zpool012/lxd/containers/centos6-builder  setuid                on                                                                                                     inherited from zpool012/lxd
zpool012/lxd/containers/centos6-builder  readonly              off                                                                                                    default
zpool012/lxd/containers/centos6-builder  zoned                 off                                                                                                    default
zpool012/lxd/containers/centos6-builder  snapdir               hidden                                                                                                 default
zpool012/lxd/containers/centos6-builder  aclinherit            passthrough                                                                                            inherited from zpool012
zpool012/lxd/containers/centos6-builder  createtxg             684944                                                                                                 -
zpool012/lxd/containers/centos6-builder  canmount              noauto                                                                                                 local
zpool012/lxd/containers/centos6-builder  xattr                 sa                                                                                                     inherited from zpool012/lxd
zpool012/lxd/containers/centos6-builder  copies                1                                                                                                      default
zpool012/lxd/containers/centos6-builder  version               5                                                                                                      -
zpool012/lxd/containers/centos6-builder  utf8only              on                                                                                                     -
zpool012/lxd/containers/centos6-builder  normalization         formD                                                                                                  -
zpool012/lxd/containers/centos6-builder  casesensitivity       sensitive                                                                                              -
zpool012/lxd/containers/centos6-builder  vscan                 off                                                                                                    default
zpool012/lxd/containers/centos6-builder  nbmand                off                                                                                                    default
zpool012/lxd/containers/centos6-builder  sharesmb              off                                                                                                    default
zpool012/lxd/containers/centos6-builder  refquota              none                                                                                                   default
zpool012/lxd/containers/centos6-builder  refreservation        none                                                                                                   default
zpool012/lxd/containers/centos6-builder  guid                  17762718722670246463                                                                                   -
zpool012/lxd/containers/centos6-builder  primarycache          all                                                                                                    default
zpool012/lxd/containers/centos6-builder  secondarycache        all                                                                                                    default
zpool012/lxd/containers/centos6-builder  usedbysnapshots       0B                                                                                                     -
zpool012/lxd/containers/centos6-builder  usedbydataset         739M                                                                                                   -
zpool012/lxd/containers/centos6-builder  usedbychildren        0B                                                                                                     -
zpool012/lxd/containers/centos6-builder  usedbyrefreservation  0B                                                                                                     -
zpool012/lxd/containers/centos6-builder  logbias               latency                                                                                                default
zpool012/lxd/containers/centos6-builder  dedup                 off                                                                                                    default
zpool012/lxd/containers/centos6-builder  mlslabel              none                                                                                                   default
zpool012/lxd/containers/centos6-builder  sync                  standard                                                                                               default
zpool012/lxd/containers/centos6-builder  dnodesize             legacy                                                                                                 default
zpool012/lxd/containers/centos6-builder  refcompressratio      2.22x                                                                                                  -
zpool012/lxd/containers/centos6-builder  written               739M                                                                                                   -
zpool012/lxd/containers/centos6-builder  logicalused           1.57G                                                                                                  -
zpool012/lxd/containers/centos6-builder  logicalreferenced     1.92G                                                                                                  -
zpool012/lxd/containers/centos6-builder  volmode               default                                                                                                default
zpool012/lxd/containers/centos6-builder  filesystem_limit      none                                                                                                   default
zpool012/lxd/containers/centos6-builder  snapshot_limit        none                                                                                                   default
zpool012/lxd/containers/centos6-builder  filesystem_count      none                                                                                                   default
zpool012/lxd/containers/centos6-builder  snapshot_count        none                                                                                                   default
zpool012/lxd/containers/centos6-builder  snapdev               hidden                                                                                                 default
zpool012/lxd/containers/centos6-builder  acltype               posixacl                                                                                               inherited from zpool012/lxd
zpool012/lxd/containers/centos6-builder  context               none                                                                                                   default
zpool012/lxd/containers/centos6-builder  fscontext             none                                                                                                   default
zpool012/lxd/containers/centos6-builder  defcontext            none                                                                                                   default
zpool012/lxd/containers/centos6-builder  rootcontext           none                                                                                                   default
zpool012/lxd/containers/centos6-builder  relatime              on                                                                                                     inherited from zpool012
zpool012/lxd/containers/centos6-builder  redundant_metadata    all                                                                                                    default
zpool012/lxd/containers/centos6-builder  overlay               off                                                                                                    default
stgraber commented 4 years ago

ZFS config looks correct (mountpoint & canmount are the usual suspects for that type of issue).

itoffshore commented 4 years ago

the container was running mesos - Centos 6 containers do not have acpid so perhaps they cannot be restarted cleanly from outside the container.

stgraber commented 4 years ago

For containers we don't use acpi we signal the init system directly and we have clean shutdown test for all the images we publish

stgraber commented 4 years ago

@itoffshore is this still happening?

itoffshore commented 4 years ago

@stgraber - the problem fixed itself after a reboot the same as for the original LXD forum user

stgraber commented 4 years ago

Weird. Hopefully the recent mount table tweaks in the snap will help prevent this from happening again.

If someone else hits this, please let us know.

candlerb commented 3 years ago

I have the identical problem. I can probably leave this in a broken state for a few hours, but ideally I'd like to reboot the host to clear this (this is my authoritative nameserver for home network)

root@nuc2:~# lxc start ns-auth
Error: Failed preparing container for start: Failed to run: zfs mount zfs/lxd/containers/ns-auth: cannot mount 'zfs/lxd/containers/ns-auth': filesystem already mounted
Try `lxc info --show-log ns-auth` for more info
root@nuc2:~# lxc info --show-log ns-auth
Name: ns-auth
Location: none
Remote: unix://
Architecture: x86_64
Created: 2018/06/15 21:10 UTC
Status: Stopped
Type: container
Profiles: br255

Log:

root@nuc2:~#

The host (nuc2) is running Ubuntu 18.04 and lxd 4.11 from snap

root@nuc2:~# snap list
Name    Version   Rev    Tracking       Publisher   Notes
core    16-2.49   10859  latest/stable  canonical✓  core
core18  20210128  1988   latest/stable  canonical✓  base
lxd     4.11      19566  latest/stable  canonical✓  -

The container (ns-auth) was Ubuntu 16.04, and I'd just done a "do-release-upgrade" to update to 18.04. It had just rebooted from inside the container, but didn't restart, and now I can't restart it from the host either.

zfs properties:

root@nuc2:~# zfs get all zfs/lxd/containers/ns-auth
NAME                        PROPERTY              VALUE                                                              SOURCE
zfs/lxd/containers/ns-auth  type                  filesystem                                                         -
zfs/lxd/containers/ns-auth  creation              Fri Jun 15 22:10 2018                                              -
zfs/lxd/containers/ns-auth  used                  1.71G                                                              -
zfs/lxd/containers/ns-auth  available             99.5G                                                              -
zfs/lxd/containers/ns-auth  referenced            809M                                                               -
zfs/lxd/containers/ns-auth  compressratio         1.77x                                                              -
zfs/lxd/containers/ns-auth  mounted               no                                                                 -
zfs/lxd/containers/ns-auth  quota                 none                                                               default
zfs/lxd/containers/ns-auth  reservation           none                                                               default
zfs/lxd/containers/ns-auth  recordsize            128K                                                               default
zfs/lxd/containers/ns-auth  mountpoint            /var/snap/lxd/common/lxd/storage-pools/default/containers/ns-auth  local
zfs/lxd/containers/ns-auth  sharenfs              off                                                                default
zfs/lxd/containers/ns-auth  checksum              on                                                                 default
zfs/lxd/containers/ns-auth  compression           lz4                                                                inherited from zfs
zfs/lxd/containers/ns-auth  atime                 on                                                                 default
zfs/lxd/containers/ns-auth  devices               on                                                                 default
zfs/lxd/containers/ns-auth  exec                  on                                                                 default
zfs/lxd/containers/ns-auth  setuid                on                                                                 default
zfs/lxd/containers/ns-auth  readonly              off                                                                default
zfs/lxd/containers/ns-auth  zoned                 off                                                                default
zfs/lxd/containers/ns-auth  snapdir               hidden                                                             default
zfs/lxd/containers/ns-auth  aclinherit            restricted                                                         default
zfs/lxd/containers/ns-auth  createtxg             852601                                                             -
zfs/lxd/containers/ns-auth  canmount              noauto                                                             local
zfs/lxd/containers/ns-auth  xattr                 on                                                                 default
zfs/lxd/containers/ns-auth  copies                1                                                                  default
zfs/lxd/containers/ns-auth  version               5                                                                  -
zfs/lxd/containers/ns-auth  utf8only              off                                                                -
zfs/lxd/containers/ns-auth  normalization         none                                                               -
zfs/lxd/containers/ns-auth  casesensitivity       sensitive                                                          -
zfs/lxd/containers/ns-auth  vscan                 off                                                                default
zfs/lxd/containers/ns-auth  nbmand                off                                                                default
zfs/lxd/containers/ns-auth  sharesmb              off                                                                default
zfs/lxd/containers/ns-auth  refquota              none                                                               default
zfs/lxd/containers/ns-auth  refreservation        none                                                               default
zfs/lxd/containers/ns-auth  guid                  18270794170817550884                                               -
zfs/lxd/containers/ns-auth  primarycache          all                                                                default
zfs/lxd/containers/ns-auth  secondarycache        all                                                                default
zfs/lxd/containers/ns-auth  usedbysnapshots       939M                                                               -
zfs/lxd/containers/ns-auth  usedbydataset         809M                                                               -
zfs/lxd/containers/ns-auth  usedbychildren        0B                                                                 -
zfs/lxd/containers/ns-auth  usedbyrefreservation  0B                                                                 -
zfs/lxd/containers/ns-auth  logbias               latency                                                            default
zfs/lxd/containers/ns-auth  dedup                 off                                                                default
zfs/lxd/containers/ns-auth  mlslabel              none                                                               default
zfs/lxd/containers/ns-auth  sync                  standard                                                           default
zfs/lxd/containers/ns-auth  dnodesize             legacy                                                             default
zfs/lxd/containers/ns-auth  refcompressratio      1.89x                                                              -
zfs/lxd/containers/ns-auth  written               733M                                                               -
zfs/lxd/containers/ns-auth  logicalused           2.79G                                                              -
zfs/lxd/containers/ns-auth  logicalreferenced     1.38G                                                              -
zfs/lxd/containers/ns-auth  volmode               default                                                            default
zfs/lxd/containers/ns-auth  filesystem_limit      none                                                               default
zfs/lxd/containers/ns-auth  snapshot_limit        none                                                               default
zfs/lxd/containers/ns-auth  filesystem_count      none                                                               default
zfs/lxd/containers/ns-auth  snapshot_count        none                                                               default
zfs/lxd/containers/ns-auth  snapdev               hidden                                                             default
zfs/lxd/containers/ns-auth  acltype               off                                                                default
zfs/lxd/containers/ns-auth  context               none                                                               default
zfs/lxd/containers/ns-auth  fscontext             none                                                               default
zfs/lxd/containers/ns-auth  defcontext            none                                                               default
zfs/lxd/containers/ns-auth  rootcontext           none                                                               default
zfs/lxd/containers/ns-auth  relatime              off                                                                default
zfs/lxd/containers/ns-auth  redundant_metadata    all                                                                default
zfs/lxd/containers/ns-auth  overlay               off                                                                default

(note mounted no). I can't find any more useful lxd logs:

root@nuc2:~# ls -l  /var/snap/lxd/common/lxd/logs/ns-auth/
total 6
-rw-r--r-- 1 root root    0 Mar  6 20:05 forkexec.log
-rw-r----- 1 root root 2178 Dec 27 08:23 lxc.conf
-rw-r----- 1 root root    0 Mar  7 10:48 lxc.log
-rw-r----- 1 root root    0 Mar  7 10:48 lxc.log.old
root@nuc2:~# tail -6 /var/snap/lxd/common/lxd/logs/lxd.log
t=2021-03-07T09:08:01+0000 lvl=info msg="Pruning expired instance backups"
t=2021-03-07T09:08:01+0000 lvl=info msg="Done pruning expired instance backups"
t=2021-03-07T09:59:13+0000 lvl=warn msg="Detected poll(POLLNVAL) event."
t=2021-03-07T10:08:01+0000 lvl=info msg="Pruning expired instance backups"
t=2021-03-07T10:08:01+0000 lvl=info msg="Done pruning expired instance backups"
t=2021-03-07T10:31:26+0000 lvl=warn msg="Detected poll(POLLNVAL) event."
root@nuc2:~#

strace of the lxd process (13251) and descendants with strace -s1024 -f -p 13251 2>ert shows:

...
[pid 16917] execve("/snap/lxd/current/zfs-0.7/bin/zfs", ["zfs", "mount", "zfs/lxd/containers/ns-auth"], 0xc000b22a00 /* 39 vars */ <unfinished ...>
...
[pid 16917] openat(AT_FDCWD, "/proc/self/mounts", O_RDONLY) = 4
[pid 16917] openat(AT_FDCWD, "/etc/dfs/sharetab", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 16917] openat(AT_FDCWD, "/dev/zfs", O_RDWR) = 5
...
[pid 16917] read(4, ...
[pid 16917] read(4, ...
[pid 16917] read(4, ...
[pid 16917] read(4, ...
[pid 16917] read(4, ...
[pid 16917] read(4, ...\nzfs/lxd/containers/ns-auth /var/snap/lxd/common/shmounts/storage-pools/", 1024) = 1024
[pid 16917] read(4, "default/containers/ns-auth zfs rw,xattr,noacl 0 0\n...
[pid 16917] read(4, "...", 1024) = 807
[pid 16917] read(4, "", 1024)           = 0
[pid 16917] write(2, "cannot mount 'zfs/lxd/containers/ns-auth': filesystem already mounted\n", 70) = 70

And indeed, I can see that while ns-auth is not mounted on the host, it is in /proc/<pid>/mounts of the lxd process

root@nuc2:~# wc -l /proc/mounts; grep ns-auth /proc/mounts
49 /proc/mounts
root@nuc2:~# wc -l /proc/13251/mounts; grep ns-auth /proc/13251/mounts
113 /proc/13251/mounts
zfs/lxd/containers/ns-auth /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth zfs rw,xattr,noacl 0 0
root@nuc2:~#

I can see this with nsenter too:

root@nuc2:~# nsenter -t 13251 grep ns-auth /proc/mounts
root@nuc2:~# nsenter -t 13251 -m grep ns-auth /proc/mounts
zfs/lxd/containers/ns-auth /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth zfs rw,xattr,noacl 0 0
root@nuc2:~#

However I'm out of my depth here. I can't do the unmount:

root@nuc2:~# nsenter -t 13251 -m umount /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth
umount: /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth: no mount point specified.
root@nuc2:~# nsenter -t 13251 -m ls /var/snap/lxd/common/shmounts/storage-pools/
ls: cannot access '/var/snap/lxd/common/shmounts/storage-pools/': No such file or directory
root@nuc2:~# nsenter -t 13251 -m ls /var/snap/lxd/common/shmounts/
instances  lxcfs
root@nuc2:~# nsenter -t 13251 -m zfs get all zfs/lxd/containers/ns-auth | grep mount
nsenter: failed to execute zfs: No such file or directory
root@nuc2:~# nsenter -t 13251 -m /snap/lxd/current/zfs-0.7/bin/zfs get all zfs/lxd/containers/ns-auth | grep mount
/snap/lxd/current/zfs-0.7/bin/zfs: error while loading shared libraries: libnvpair.so.1: cannot open shared object file: No such file or directory

Is there anything else you want me to check before rebooting?

candlerb commented 3 years ago

BTW there's nothing unusual about the config of this container.

root@nuc2:~# lxc config show -e ns-auth
architecture: x86_64
config:
  volatile.base_image: 8220e89e33e6f62b56cb451cfed61574074416a66a6e7c61ff574d95572e6661
  volatile.eth0.hwaddr: 00:16:3e:27:fe:a9
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.power: STOPPED
  volatile.uuid: 43b39bcd-17a8-447f-83a4-dd6f0aeda98c
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br255
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- br255
stateful: false
description: ""
root@nuc2:~#
candlerb commented 3 years ago

Ha, got zfs binary to work (needed to copy LD_LIBRARY_PATH from /proc/13251/environ)

root@nuc2:~# nsenter -t 13251 -a
mesg: ttyname failed: No such device
root@nuc2:/# export LD_LIBRARY_PATH=/snap/lxd/current/zfs-0.7/lib/:/var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl32:/var/lib/snapd/void:/snap/lxd/19566/lib:/snap/lxd/19566/lib/x86_64-linux-gnu:/snap/lxd/19566/lib/x86_64-linux-gnu/ceph:/snap/lxd/19566/zfs-0.6/lib:/snap/lxd/19566/zfs-2.0/lib:/snap/lxd/19566/lib:/snap/lxd/19566/lib/x86_64-linux-gnu:/snap/lxd/current/lib:/snap/lxd/current/lib/x86_64-linux-gnu:/snap/lxd/current/lib/x86_64-linux-gnu/ceph
root@nuc2:/# /snap/lxd/current/zfs-0.7/bin/zfs get all zfs/lxd/containers/ns-auth | grep mount
zfs/lxd/containers/ns-auth  mounted               yes                                                                -
zfs/lxd/containers/ns-auth  mountpoint            /var/snap/lxd/common/lxd/storage-pools/default/containers/ns-auth  local
zfs/lxd/containers/ns-auth  canmount              noauto                                                             local
root@nuc2:/# /snap/lxd/current/zfs-0.7/bin/zfs get all zfs/lxd/containers/ns-auth
NAME                        PROPERTY              VALUE                                                              SOURCE
zfs/lxd/containers/ns-auth  type                  filesystem                                                         -
zfs/lxd/containers/ns-auth  creation              Fri Jun 15 22:10 2018                                              -
zfs/lxd/containers/ns-auth  used                  1.71G                                                              -
zfs/lxd/containers/ns-auth  available             99.4G                                                              -
zfs/lxd/containers/ns-auth  referenced            809M                                                               -
zfs/lxd/containers/ns-auth  compressratio         1.77x                                                              -
zfs/lxd/containers/ns-auth  mounted               yes                                                                -
zfs/lxd/containers/ns-auth  quota                 none                                                               default
zfs/lxd/containers/ns-auth  reservation           none                                                               default
zfs/lxd/containers/ns-auth  recordsize            128K                                                               default
zfs/lxd/containers/ns-auth  mountpoint            /var/snap/lxd/common/lxd/storage-pools/default/containers/ns-auth  local
zfs/lxd/containers/ns-auth  sharenfs              off                                                                default
zfs/lxd/containers/ns-auth  checksum              on                                                                 default
zfs/lxd/containers/ns-auth  compression           lz4                                                                inherited from zfs
zfs/lxd/containers/ns-auth  atime                 on                                                                 default
zfs/lxd/containers/ns-auth  devices               on                                                                 default
zfs/lxd/containers/ns-auth  exec                  on                                                                 default
zfs/lxd/containers/ns-auth  setuid                on                                                                 default
zfs/lxd/containers/ns-auth  readonly              off                                                                default
zfs/lxd/containers/ns-auth  zoned                 off                                                                default
zfs/lxd/containers/ns-auth  snapdir               hidden                                                             default
zfs/lxd/containers/ns-auth  aclinherit            restricted                                                         default
zfs/lxd/containers/ns-auth  createtxg             852601                                                             -
zfs/lxd/containers/ns-auth  canmount              noauto                                                             local
zfs/lxd/containers/ns-auth  xattr                 on                                                                 default
zfs/lxd/containers/ns-auth  copies                1                                                                  default
zfs/lxd/containers/ns-auth  version               5                                                                  -
zfs/lxd/containers/ns-auth  utf8only              off                                                                -
zfs/lxd/containers/ns-auth  normalization         none                                                               -
zfs/lxd/containers/ns-auth  casesensitivity       sensitive                                                          -
zfs/lxd/containers/ns-auth  vscan                 off                                                                default
zfs/lxd/containers/ns-auth  nbmand                off                                                                default
zfs/lxd/containers/ns-auth  sharesmb              off                                                                default
zfs/lxd/containers/ns-auth  refquota              none                                                               default
zfs/lxd/containers/ns-auth  refreservation        none                                                               default
zfs/lxd/containers/ns-auth  guid                  18270794170817550884                                               -
zfs/lxd/containers/ns-auth  primarycache          all                                                                default
zfs/lxd/containers/ns-auth  secondarycache        all                                                                default
zfs/lxd/containers/ns-auth  usedbysnapshots       939M                                                               -
zfs/lxd/containers/ns-auth  usedbydataset         809M                                                               -
zfs/lxd/containers/ns-auth  usedbychildren        0B                                                                 -
zfs/lxd/containers/ns-auth  usedbyrefreservation  0B                                                                 -
zfs/lxd/containers/ns-auth  logbias               latency                                                            default
zfs/lxd/containers/ns-auth  dedup                 off                                                                default
zfs/lxd/containers/ns-auth  mlslabel              none                                                               default
zfs/lxd/containers/ns-auth  sync                  standard                                                           default
zfs/lxd/containers/ns-auth  dnodesize             legacy                                                             default
zfs/lxd/containers/ns-auth  refcompressratio      1.89x                                                              -
zfs/lxd/containers/ns-auth  written               733M                                                               -
zfs/lxd/containers/ns-auth  logicalused           2.79G                                                              -
zfs/lxd/containers/ns-auth  logicalreferenced     1.38G                                                              -
zfs/lxd/containers/ns-auth  volmode               default                                                            default
zfs/lxd/containers/ns-auth  filesystem_limit      none                                                               default
zfs/lxd/containers/ns-auth  snapshot_limit        none                                                               default
zfs/lxd/containers/ns-auth  filesystem_count      none                                                               default
zfs/lxd/containers/ns-auth  snapshot_count        none                                                               default
zfs/lxd/containers/ns-auth  snapdev               hidden                                                             default
zfs/lxd/containers/ns-auth  acltype               off                                                                default
zfs/lxd/containers/ns-auth  context               none                                                               default
zfs/lxd/containers/ns-auth  fscontext             none                                                               default
zfs/lxd/containers/ns-auth  defcontext            none                                                               default
zfs/lxd/containers/ns-auth  rootcontext           none                                                               default
zfs/lxd/containers/ns-auth  relatime              off                                                                default
zfs/lxd/containers/ns-auth  redundant_metadata    all                                                                default
zfs/lxd/containers/ns-auth  overlay               off                                                                default

However I still can't unmount it:

root@nuc2:/# /snap/lxd/current/zfs-0.7/bin/zfs unmount zfs/lxd/containers/ns-auth
umount: /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth: no mount point specified.
cannot unmount '/var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth': umount failed
root@nuc2:/# /snap/lxd/current/zfs-0.7/bin/zfs unmount /var/snap/lxd/common/lxd/storage-pools/default/containers/ns-auth
cannot unmount '/var/snap/lxd/common/lxd/storage-pools/default/containers/ns-auth': not a mountpoint
root@nuc2:/# /snap/lxd/current/zfs-0.7/bin/zfs get mountpoint zfs/lxd/containers/ns-auth
NAME                        PROPERTY    VALUE                                                              SOURCE
zfs/lxd/containers/ns-auth  mountpoint  /var/snap/lxd/common/lxd/storage-pools/default/containers/ns-auth  local

/proc/mounts says it's mounted somewhere else - but that doesn't work either.

root@nuc2:/# grep ns-auth /proc/mounts
zfs/lxd/containers/ns-auth /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth zfs rw,xattr,noacl 0 0
root@nuc2:/# ls /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth
ls: cannot access '/var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth': No such file or directory
root@nuc2:/# /snap/lxd/current/zfs-0.7/bin/zfs unmount /var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth
cannot unmount '/var/snap/lxd/common/shmounts/storage-pools/default/containers/ns-auth': No such file or directory
candlerb commented 3 years ago

I've restarted the server now. Sorry.

candlerb commented 3 years ago

I have this problem again. I am now running lxd 4.18 from snap under Ubuntu 18.04.5, kernel linux-image-generic-hwe-18.04 (5.4.0.81.91~18.04.73), zfsutils-linux 0.7.5-1ubuntu16.12, and my default pool is zfs.

I have a number of containers, including one called "netbox" and another called "netbox3". I simply wanted to rename the container "netbox" (which was running fine) to "netbox2".

When I attempt this it fails:

root@nuc2:~# lxc rename netbox netbox2
Error: Renaming of running container not allowed
root@nuc2:~# lxc stop netbox
root@nuc2:~# lxc rename netbox netbox2
Error: Rename instance: Failed to run: zfs rename zfs/lxd/containers/netbox zfs/lxd/containers/netbox2: umount: /var/snap/lxd/common/shmounts/storage-pools/default/containers/netbox: no mount point specified.
cannot unmount '/var/snap/lxd/common/shmounts/storage-pools/default/containers/netbox': umount failed

And the container is unchanged:

root@nuc2:~# lxc list netbox
+---------+---------+---------------------+-------------------------------+-----------+-----------+
|  NAME   |  STATE  |        IPV4         |             IPV6              |   TYPE    | SNAPSHOTS |
+---------+---------+---------------------+-------------------------------+-----------+-----------+
| netbox  | STOPPED |                     |                               | CONTAINER | 0         |
+---------+---------+---------------------+-------------------------------+-----------+-----------+
| netbox3 | RUNNING | 10.12.255.50 (eth0) | 2a01:5d00:1000:8ff::50 (eth0) | CONTAINER | 0         |
+---------+---------+---------------------+-------------------------------+-----------+-----------+

root@nuc2:~# lxc storage list
+---------+--------+--------------------------------+-------------+---------+
|  NAME   | DRIVER |             SOURCE             | DESCRIPTION | USED BY |
+---------+--------+--------------------------------+-------------+---------+
| default | zfs    | zfs/lxd                        |             | 18      |
+---------+--------+--------------------------------+-------------+---------+
| plain   | dir    | /var/lib/snapd/hostfs/data/lxd |             | 0       |
+---------+--------+--------------------------------+-------------+---------+

root@nuc2:~# zfs list -r zfs/lxd | grep netbox
zfs/lxd/containers/netbox                                                                1.40G  92.1G  1.20G  /var/snap/lxd/common/lxd/storage-pools/default/containers/netbox
zfs/lxd/containers/netbox3                                                               1.98G  92.1G  1.87G  /var/snap/lxd/common/lxd/storage-pools/default/containers/netbox3

But more importantly, I now can't even restart it, and the log is empty.

root@nuc2:~# lxc start netbox
Error: Failed preparing container for start: Failed to run: zfs mount zfs/lxd/containers/netbox: cannot mount 'zfs/lxd/containers/netbox': filesystem already mounted
Try `lxc info --show-log netbox` for more info
root@nuc2:~# lxc info --show-log netbox
Name: netbox
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2020/04/13 17:11 BST
Last Used: 2021/07/21 10:44 BST

Log:

root@nuc2:~# zfs get all zfs/lxd/containers/netbox | grep mount
zfs/lxd/containers/netbox  mounted               no                                                                                                -
zfs/lxd/containers/netbox  mountpoint            /var/snap/lxd/common/lxd/storage-pools/default/containers/netbox                                  local
zfs/lxd/containers/netbox  canmount              noauto                                                                                            local
root@nuc2:~#

However, if I create a fresh container for testing, it's happy:

root@nuc2:~# lxc init ubuntu:18.04 test123
Creating test123
oot@nuc2:~# zfs list -r zfs/lxd | grep test
zfs/lxd/containers/test123                                                                196K  92.1G   453M  /var/snap/lxd/common/lxd/storage-pools/default/containers/test123
root@nuc2:~# lxc rename test123 test456
root@nuc2:~# lxc start test456
root@nuc2:~# lxc stop test456
root@nuc2:~# lxc rename test456 test789
root@nuc2:~# lxc delete test789
root@nuc2:~#

root@nuc2:~# lxc init ubuntu:18.04 blah
Creating blah
root@nuc2:~# lxc init ubuntu:20.04 blah3
Creating blah3
root@nuc2:~# lxc rename blah blah2
root@nuc2:~# lxc delete blah2 blah3
root@nuc2:~#

Once again, I can see that the lxd process itself has this filesystem in its mounts:

root@nuc2:~# ps auxwww | grep lxd
...
root     11576  0.0  0.0   2616   352 ?        Ss   Sep06   0:00 /bin/sh /snap/lxd/21468/commands/daemon.start
root     11739  0.1  0.6 1924312 50564 ?       Sl   Sep06   5:44 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root@nuc2:~# grep netbox /proc/11739/mounts
zfs/lxd/containers/netbox /var/snap/lxd/common/shmounts/storage-pools/default/containers/netbox zfs rw,xattr,noacl 0 0
zfs/lxd/containers/netbox3 /var/snap/lxd/common/lxd/storage-pools/default/containers/netbox3 zfs rw,xattr,noacl 0 0
root@nuc2:~#

(EDIT: interesting that one is /var/snap/lxd/common/shmounts/... and one is /var/snap/lxd/common/lxd/.... I observe that the directory /var/snap/lxd/common/shmounts exists but is empty. I also see that /var/snap/lxd/common/lxd/storage-pools/default/containers/netbox and /var/snap/lxd/common/lxd/storage-pools/default/containers/netbox3 both exist, but both appear to be empty)

I am happy to leave this container in a broken state for a day or two, if you can give me other commands to poke it with.

tomponline commented 3 years ago

I've reopened this as there still seems to be some issues with ZFS when snap package is refreshed. Did you reboot the machine after day 2, and did that fix the issue again?

candlerb commented 3 years ago

It may or may not be related, but soon afterwards this machine started to report some errors with the zpool on the SSD:

The number of checksum errors associated with a ZFS device
exceeded acceptable levels. ZFS has marked the device as
degraded.

 impact: Fault tolerance of the pool may be compromised.
    eid: 18658
  class: statechange
  state: DEGRADED

This is a single-device vdev (no redundancy).

Although it still appeared to be functioning, I took the precaution of replacing the SSD, and rebuilding the machine with Ubuntu 20.04 while I was at it, so I'm afraid that means the error state is now lost.

tomponline commented 3 years ago

Ah OK that sounds suspect, I'll close again but if you see reoccur without any ZFS errors then let us know. Thanks

renky commented 2 years ago

I know this issue is closed, but I'd like to add any information since I faced the same issue

In my case the error messages was:

>lxc start mycontainer
Error: Failed preparing container for start: Failed to run: zfs mount zpool1/lxd/containers/mycontainer: cannot mount 'zpool1/lxd/containers/mycontainer': filesystem already mounted
Try `lxc info --show-log mycontainer` for more info

after this I thought ok, if it is mounted, I'll unmount it:

>zfs umount zpool1/lxd/containers/mycontainer
cannot unmount 'zpool1/lxd/containers/mycontainer': not currently mounted

hmmm. thats really suspect - in a strange state of mind I thought: ok, then I'll mount it and help lxd...

>zfs mount zpool1/lxd/containers/mycontainer

and it worked - after that I could restart the container without rebooting the server...

I cannot explain it, but it solved my issue and in meanwhile I had this situation the second time and it worked the second time - so maybe this helps anybody who faces the same issues... although this cannot be the solution for future, it helped me out.

Lxd seems to think that this filesystem is already mounted, but zfs sais it isn't - and if you mount it manually with zfs directly, lxd accepts that and starts up...

tomponline commented 2 years ago

We have switched to using the normal mount command and a ZFS mountpoint=none setting recently after we saw that using zfs mount was sometimes causing the volume to be mounted in the host's mount namespace rather than the snap's namespace (even though we were running the command from the snap's mount namespace).

Using the legacy tooling and telling ZFS not to use its own mountpoint seemed to help when one of our devs was experiencing the same problem when using an existing ZFS pool with LXD as a sub-dataset.

https://github.com/lxc/lxd/pull/9349 https://github.com/lxc/lxd/pull/9353

This seems like a ZFS bug somehow, but hopefully these changes will work around it.

v3ss0n commented 2 years ago

I am using 4.20 and i still encourted this problem Only way to solve is to reboot the entiere system. I am using snap build of lxd.

can we reopen this?

candlerb commented 2 years ago

Can you try:

zfs list -o name,mountpoint,canmount,mounted

If the top-level dataset in the lxd pool or any of the individual container datasets have a mountpoint set, then unset it like this:

zfs set mountpoint=none,canmount=noauto foo/bar

See this discussion.

v3ss0n commented 2 years ago
default                                                                                        none                                                                                                                          on       no
default/containers                                                                             none                                                                                                                          on       no
default/containers/CCOMMERCE1                                                                  none                                                                                                                      noauto       no

so

zfs set mountpoint=none,canmount=noauto default/containers ?

candlerb commented 2 years ago

Since mountpoint=none I doubt it will make a difference, but you can try it.

If the problem remains, then I suggest you start from scratch: show exactly what you see on your system, what commands you type, what errors you see, and what prompts the error to occur (you already said that rebooting the system clears the problem)

v3ss0n commented 2 years ago

first , i was restoring a container from snapshot lxd was updated from snap
lxc restore magento-dev clean

then it couldn't start

lxc start magento-dev givets that error.

tomponline commented 2 years ago

@v3ss0n shall we pick a single place to discuss this rather than spread it over two threads?

Would you like to proceed here or over at https://discuss.linuxcontainers.org/t/time-to-fix-this-once-and-for-all-failed-preparing-container-for-start-failed-to-run-zfs-set-mountpoint-none-canmount-noauto/12662?

v3ss0n commented 2 years ago

yes i am there now.

tomponline commented 2 years ago

So I would suggest removing any mount points on your ZFS datasets, as from https://discuss.linuxcontainers.org/t/time-to-fix-this-once-and-for-all-failed-preparing-container-for-start-failed-to-run-zfs-set-mountpoint-none-canmount-noauto/12662/4?u=tomp I can see some of your datasets have mount points which may be causing ZFS issues.

tomponline commented 2 years ago

A dataset is the value in the "NAME" column from zfs list output.

candlerb commented 2 years ago

Related issue: https://github.com/lxc/lxd-pkg-snap/issues/61

(Seems to be triggered by updates of lxd snap and/or core snap)

v3ss0n commented 2 years ago

seems so , mine was updated a week ago i think. so should create a script to snappshot and restore a few hundred times and run it overnight . May be i will try this weekend.

StormYudi commented 2 years ago

Same problem happend in lxd 4.22 with zfs, I can't find the reason why it happened:

root@rainyun:~# lxc start bt611252 Error: Failed to run: zfs set mountpoint=none canmount=noauto data/containers/bt611252: umount: /var/snap/lxd/common/shmounts/storage-pools/data/containers/bt611252: no mount point specified. cannot unmount '/var/snap/lxd/common/shmounts/storage-pools/data/containers/bt611252': umount failed Try lxc info --show-log bt611252 for more info

cilex-ft commented 2 years ago

Same here, lxc 4.22 via snap on ubuntu 20.04.3 LTS. After editing a profile, some containers (unrelated to this profile) can't reboot nor start.

# lxc start e0028
Error: Failed to run: zfs set mountpoint=none canmount=noauto pool1/containers/e0028: umount: /var/snap/lxd/common/shmounts/storage-pools/pool1/containers/e0028: no mount point specified.
cannot unmount '/var/snap/lxd/common/shmounts/storage-pools/pool1/containers/e0028': umount failed

Editing any profile also displays errors:

# lxc profile edit large

 - Project: default, Instance: t0618: Failed to write backup file: Failed to run: zfs set mountpoint=none canmount=noauto pool1/containers/t0618: umount: /var/snap/lxd/common/shmounts/storage-pools/pool1/containers/t0618: no mount point specified.
cannot unmount '/var/snap/lxd/common/shmounts/storage-pools/pool1/containers/t0618': umount failed

8 containers using this profile are listed in the error - but 21 total use this profile, so 13 of them are not affected. Those can be stopped and restart norminally.

zfs list MOUNTPOINT column is "none" for almost all containers, even those which behave (so far) correctly.

We tried [pool1/containers/e0028](this advice from TomP) to remove the mountpoint for containers which had a mountpoint defined, to no avail:

sudo zfs set mountpoint=none,canmount=noauto pool1/containers/container_with_mountpoint

Right now we have tens of production containers hit. Restarting the host is not an option, as it would stop containers which are still working.

We would really appreciate if you could provide us with commands or a script to "repair" stuck containers - and fix the root cause of the problem if you can, of course...

cilex-ft commented 2 years ago

@tomponline Tom, should we open an other issue, or is it OK if we continue on this one (as @stgraber wrote last year above)?

tomponline commented 2 years ago

I've reopened this one, although I don't know there is much we can do until the snapd or zfs bug (which ever is causing it) is resolved.

What we are really missing is a reliable reproducer.

stgraber commented 2 years ago

Closing as it's not a LXD issue, we have a packaging bug open to track this instead. https://github.com/lxc/lxd-pkg-snap/issues/61

cilex-ft commented 2 years ago

Some more context: it seems to be related to snap upgrading lxd, which triggers Feb 01 17:50:20 h-h04 lxd.daemon[1645395]: Failed to mount new mntns: Invalid argument

journalctl -u snap.lxd.daemon -S 2022-02-01

(...)
Feb 01 17:50:05 h-h04 systemd[1]: Stopping Service for snap application lxd.daemon...
Feb 01 17:50:05 h-h04 lxd.daemon[1644835]: => Stop reason is: snap refresh
Feb 01 17:50:05 h-h04 lxd.daemon[1644835]: => Stopping LXD
Feb 01 17:50:07 h-h04 lxd.daemon[3927214]: => LXD exited cleanly
Feb 01 17:50:07 h-h04 lxd.daemon[1644835]: ==> Stopped LXD
Feb 01 17:50:07 h-h04 systemd[1]: snap.lxd.daemon.service: Succeeded.
Feb 01 17:50:07 h-h04 systemd[1]: Stopped Service for snap application lxd.daemon.
Feb 01 17:50:20 h-h04 systemd[1]: Started Service for snap application lxd.daemon.
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: => Preparing the system (22306)
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Loading snap configuration
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Setting up mntns symlink (mnt:[4026536416])
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/storage-pools
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/devices
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Setting up persistent shmounts path
Feb 01 17:50:20 h-h04 lxd.daemon[1645395]: Failed to mount new mntns: Invalid argument
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ====> Failed to setup shmounts, continuing without
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ====> Making LXD shmounts use the persistent path
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ====> Making LXCFS use the persistent path
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Setting up kmod wrapper
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Preparing /boot
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Preparing a clean copy of /run
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Preparing /run/bin
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ==> Preparing a clean copy of /etc
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Preparing a clean copy of /usr/share/misc
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Setting up ceph configuration
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Setting up LVM configuration
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Setting up OVN configuration
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Rotating logs
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Setting up ZFS (0.8)
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Escaping the systemd cgroups
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ====> Detected cgroup V1
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Escaping the systemd process resource limits
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: ==> Disabling shiftfs on this kernel (auto)
Feb 01 17:50:21 h-h04 lxd.daemon[1645329]: => Starting LXCFS
(...)

We had the same problem on some previous refreshes, which triggered the same damn error (but we didn't open a ticket at that time):

xyz@h-h04:/var/log# journalctl -u snap.lxd.daemon | grep "Failed to setup shmounts"

Jun 17 02:52:00 h-h04 lxd.daemon[4075130]: ====> Failed to setup shmounts, continuing without
Aug 10 02:42:11 h-h04 lxd.daemon[3755743]: ====> Failed to setup shmounts, continuing without
Feb 01 17:50:20 h-h04 lxd.daemon[1645329]: ====> Failed to setup shmounts, continuing without

A previous refresh triggered no error on Setting up mntns symlink:

Feb 03 02:05:05 h-h04 systemd[1]: Stopping Service for snap application lxd.daemon...
Feb 03 02:05:05 h-h04 lxd.daemon[2655427]: => Stop reason is: snap refresh
Feb 03 02:05:05 h-h04 lxd.daemon[2655427]: => Stopping LXD
Feb 03 02:05:07 h-h04 lxd.daemon[1645329]: => LXD exited cleanly
Feb 03 02:05:07 h-h04 lxd.daemon[2655427]: ==> Stopped LXD
Feb 03 02:05:07 h-h04 systemd[1]: snap.lxd.daemon.service: Succeeded.
Feb 03 02:05:07 h-h04 systemd[1]: Stopped Service for snap application lxd.daemon.
Feb 03 02:05:17 h-h04 systemd[1]: Started Service for snap application lxd.daemon.
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: => Preparing the system (22340)
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Loading snap configuration
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Setting up mntns symlink (mnt:[4026536416])
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Setting up kmod wrapper
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Preparing /boot
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Preparing a clean copy of /run
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Preparing /run/bin
Feb 03 02:05:18 h-h04 lxd.daemon[2655929]: ==> Preparing a clean copy of /etc
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Preparing a clean copy of /usr/share/misc
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Setting up ceph configuration
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Setting up LVM configuration
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Setting up OVN configuration
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Rotating logs
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Setting up ZFS (0.8)
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Escaping the systemd cgroups
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ====> Detected cgroup V1
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Escaping the systemd process resource limits
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: ==> Disabling shiftfs on this kernel (auto)
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: => Re-using existing LXCFS
Feb 03 02:05:19 h-h04 lxd.daemon[2655929]: => Starting LXD

We're not going to reboot the server soon, so we'll be happy to provide logs if it can help.

@tomponline, is there any known solution to repair "broken" containers? Is zfs set mountpoint=none canmount=noauto pool1/containers/container_with_mountpoint supposed to do any good?

tomponline commented 2 years ago

Yes it seems to occur on snap refresh, but not clear whats causing it.

stgraber commented 2 years ago

I'd recommend posting that comment in the other issue. There are some ugly fixes to restore some functionality but the only fix which will fix it all is a reboot.

We really desperately need a reproducer, so clear set of steps which cause the problem to show up. Once we have that, it should be just a few hours/days for us to update the logic to fix whatever is wrong, but so far, we've never had anything other than "it happens after a few months" and looking at the damage doesn't help us find the cause...

My best guess currently is that it's caused by a particular sequence of core20 and lxd snap refreshes. LXD refreshes on their own never cause this, but the fact that it only hits those who don't regularly update+reboot their systems makes me think it's got to do with core20 itself updating potentially 2-3 times before that screws things up on the next lxd refresh, but that's been so far impossible to confirm.

cilex-ft commented 2 years ago

@stgraber do you consider this https://github.com/lxc/lxd-pkg-snap/issues/61#issuecomment-674092760 an "ugly fix"? It worked well on our server, fixed all containers, and avoided a reboot... so we were thinking about defining it as our official life-saver, and even automating it if umount pops in the logs... Rebooting is a much worse "fix"!

stgraber commented 2 years ago

I consider it to be an ugly fix because as a result of doing that you won't be able to pass in new devices into those containers until they're restarted and things like file transfers may also be affected in some cases.