canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.35k stars 931 forks source link

Error: Failed container creation: Create container from image: Image create: Unpack failed, Failed to run: unsquashfs #5449

Closed falstaff1288 closed 5 years ago

falstaff1288 commented 5 years ago

Required information

Issue description

After updating my ArchLinux system, whenever I attempt to create a new container with an image that wasn't downloaded/wasn't already in my list of download images, I get the following error :

$ lxc launch ubuntu:xenial test1 -c security.nesting=true
Creating test1
Error: Failed container creation: Create container from image: Image create: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/LVM/images/b579861d72c87f006a28ba6031e3fbee0688c1f26302d86ae8db08cf2d0b63f7/rootfs -n /var/lib/lxd/images/b579861d72c87f006a28ba6031e3fbee0688c1f26302d86ae8db08cf2d0b63f7.rootfs: FATAL ERROR:Data queue size is too large.  FATAL ERROR:Data queue size is too large

[=========================================================================================================================================================================-] 10928/10928 100%

created 8186 files created 944 directories created 1474 symlinks created 7 devices created 0 fifos

ls -l test/

total 80 drwxr-xr-x 2 root root 4096 27 jun 2018 bin drwxr-xr-x 2 root root 4096 30 mai 2016 boot drwxr-xr-x 3 root root 4096 27 jun 2018 dev drwxr-xr-x 35 root root 4096 27 jun 2018 etc drwxr-xr-x 2 root root 4096 30 mai 2016 home drwxr-xr-x 7 root root 4096 22 jun 2012 lib drwxr-xr-x 2 root root 4096 27 jun 2018 lib64 drwxr-xr-x 2 root root 4096 27 jun 2018 media drwxr-xr-x 2 root root 4096 30 mai 2016 mnt drwxr-xr-x 2 root root 4096 27 jun 2018 opt drwxr-xr-x 2 root root 4096 30 mai 2016 proc drwx------ 2 root root 4096 27 jun 2018 root drwxr-xr-x 2 root root 4096 27 jun 2018 run drwxr-xr-x 2 root root 4096 27 jun 2018 sbin drwxr-xr-x 2 root root 4096 10 jun 2012 selinux drwxr-xr-x 2 root root 4096 27 jun 2018 srv drwxr-xr-x 2 root root 4096 14 jui 2013 sys drwxrwxrwt 2 root root 4096 27 jun 2018 tmp drwxr-xr-x 10 root root 4096 27 jun 2018 usr drwxr-xr-x 11 root root 4096 27 jun 2018 var


# Information to attach

 - [X] Any relevant kernel output (`dmesg`)
See below
 - [X] Container log (`lxc info NAME --show-log`)
Can't create containers
 - [X] Container configuration (`lxc config show NAME --expanded`)
Same as above
 - [X] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
Output from the command journalctl -u lxd

-- Logs begin at Mon 2018-09-10 05:42:33 EDT. -- jan 31 22:36:06 nas.maison.lan lxd[1422]: t=2019-01-31T22:36:06-0500 lvl=eror msg="Failed to create LVM storage volume for container \"iscsi-target\" on storage pool \"LVM\": Image create: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/LVM/images/df7155e205ba18e619d72ec7e8bff9d558d693a94d512cbb6ad56b09955d92ad/rootfs -n /var/lib/lxd/images/df7155e205ba18e619d72ec7e8bff9d558d693a94d512cbb6ad56b09955d92ad.rootfs: FATAL ERROR:Data queue size is too large. FATAL ERROR:Data queue size is too large" jan 31 22:36:35 nas.maison.lan lxd[1422]: t=2019-01-31T22:36:35-0500 lvl=eror msg="Failed to create LVM storage volume for container \"iscsi-target\" on storage pool \"LVM\": Image create: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/LVM/images/df7155e205ba18e619d72ec7e8bff9d558d693a94d512cbb6ad56b09955d92ad/rootfs -n /var/lib/lxd/images/df7155e205ba18e619d72ec7e8bff9d558d693a94d512cbb6ad56b09955d92ad.rootfs: FATAL ERROR:Data queue size is too large. FATAL ERROR:Data queue size is too large" jan 31 22:36:51 nas.maison.lan lxd[1422]: t=2019-01-31T22:36:51-0500 lvl=eror msg="Failed to create LVM storage volume for container \"iscsi-target\" on storage pool \"LVM\": Image create: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/LVM/images/df7155e205ba18e619d72ec7e8bff9d558d693a94d512cbb6ad56b09955d92ad/rootfs -n /var/lib/lxd/images/df7155e205ba18e619d72ec7e8bff9d558d693a94d512cbb6ad56b09955d92ad.rootfs: FATAL ERROR:Data queue size is too large. FATAL ERROR:Data queue size is too large" jan 31 22:44:40 nas.maison.lan lxd[1422]: t=2019-01-31T22:44:40-0500 lvl=eror msg="Failed to create LVM storage volume for container \"iscsi-target\" on storage pool \"LVM\": Image create: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/LVM/images/df7155e205ba18e619d72ec7e8bff9d558d693a94d512cbb6ad56b09955d92ad/rootfs -n /var/lib/lxd/images/df7155e205ba18e619d72ec7e8bff9d558d693a94d512cbb6ad56b09955d92ad.rootfs: FATAL ERROR:Data queue size is too large. FATAL ERROR:Data queue size is too large" jan 31 22:49:43 nas.maison.lan lxd[1422]: t=2019-01-31T22:49:43-0500 lvl=eror msg="Failed to create LVM storage volume for container \"iscsi-target\" on storage pool \"LVM\": Image create: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/LVM/images/df7155e205ba18e619d72ec7e8bff9d558d693a94d512cbb6ad56b09955d92ad/rootfs -n /var/lib/lxd/images/df7155e205ba18e619d72ec7e8bff9d558d693a94d512cbb6ad56b09955d92ad.rootfs: FATAL ERROR:Data queue size is too large. FATAL ERROR:Data queue size is too large" jan 31 22:56:30 nas.maison.lan lxd[1422]: t=2019-01-31T22:56:30-0500 lvl=eror msg="Failed to create LVM storage volume for container \"iscsi-target\" on storage pool \"LVM\": Image create: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/LVM/images/df7155e205ba18e619d72ec7e8bff9d558d693a94d512cbb6ad56b09955d92ad/rootfs -n /var/lib/lxd/images/df7155e205ba18e619d72ec7e8bff9d558d693a94d512cbb6ad56b09955d92ad.rootfs: FATAL ERROR:Data queue size is too large. FATAL ERROR:Data queue size is too large" jan 31 22:56:45 nas.maison.lan lxd[1422]: t=2019-01-31T22:56:45-0500 lvl=warn msg="Unable to update backup.yaml at this time" name=fitting-chigger rootfs=/var/lib/lxd/containers/fitting-chigger/rootfs jan 31 22:57:16 nas.maison.lan lxd[1422]: No such file or directory - Failed to receive file descriptor jan 31 22:59:49 nas.maison.lan lxd[1422]: t=2019-01-31T22:59:49-0500 lvl=eror msg="Failed to create LVM storage volume for container \"test1\" on storage pool \"LVM\": Image create: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/LVM/images/b579861d72c87f006a28ba6031e3fbee0688c1f26302d86ae8db08cf2d0b63f7/rootfs -n /var/lib/lxd/images/b579861d72c87f006a28ba6031e3fbee0688c1f26302d86ae8db08cf2d0b63f7.rootfs: FATAL ERROR:Data queue size is too large. FATAL ERROR:Data queue size is too large" jan 31 23:00:21 nas.maison.lan lxd[1422]: t=2019-01-31T23:00:21-0500 lvl=eror msg="Failed to create LVM storage volume for container \"test1\" on storage pool \"LVM\": Image create: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/LVM/images/b579861d72c87f006a28ba6031e3fbee0688c1f26302d86ae8db08cf2d0b63f7/rootfs -n /var/lib/lxd/images/b579861d72c87f006a28ba6031e3fbee0688c1f26302d86ae8db08cf2d0b63f7.rootfs: FATAL ERROR:Data queue size is too large. FATAL ERROR:Data queue size is too large"

 - [X] Output of the client with --debug

lxc launch -v --debug ubuntu:xenial test1 -c security.nesting=true DBUG[01-31|23:15:14] Connecting to a local LXD over a Unix socket DBUG[01-31|23:15:14] Sending request to LXD method=GET url=http://unix.socket/1.0 etag= DBUG[01-31|23:15:14] Got response struct from LXD DBUG[01-31|23:15:14] { "config": {}, "api_extensions": [ "storage_zfs_remove_snapshots", "container_host_shutdown_timeout", "container_stop_priority", "container_syscall_filtering", "auth_pki", "container_last_used_at", "etag", "patch", "usb_devices", "https_allowed_credentials", "image_compression_algorithm", "directory_manipulation", "container_cpu_time", "storage_zfs_use_refquota", "storage_lvm_mount_options", "network", "profile_usedby", "container_push", "container_exec_recording", "certificate_update", "container_exec_signal_handling", "gpu_devices", "container_image_properties", "migration_progress", "id_map", "network_firewall_filtering", "network_routes", "storage", "file_delete", "file_append", "network_dhcp_expiry", "storage_lvm_vg_rename", "storage_lvm_thinpool_rename", "network_vlan", "image_create_aliases", "container_stateless_copy", "container_only_migration", "storage_zfs_clone_copy", "unix_device_rename", "storage_lvm_use_thinpool", "storage_rsync_bwlimit", "network_vxlan_interface", "storage_btrfs_mount_options", "entity_description", "image_force_refresh", "storage_lvm_lv_resizing", "id_map_base", "file_symlinks", "container_push_target", "network_vlan_physical", "storage_images_delete", "container_edit_metadata", "container_snapshot_stateful_migration", "storage_driver_ceph", "storage_ceph_user_name", "resource_limits", "storage_volatile_initial_source", "storage_ceph_force_osd_reuse", "storage_block_filesystem_btrfs", "resources", "kernel_limits", "storage_api_volume_rename", "macaroon_authentication", "network_sriov", "console", "restrict_devlxd", "migration_pre_copy", "infiniband", "maas_network", "devlxd_events", "proxy", "network_dhcp_gateway", "file_get_symlink", "network_leases", "unix_device_hotplug", "storage_api_local_volume_handling", "operation_description", "clustering", "event_lifecycle", "storage_api_remote_volume_handling", "nvidia_runtime", "container_mount_propagation", "container_backup", "devlxd_images", "container_local_cross_pool_handling", "proxy_unix", "proxy_udp", "clustering_join", "proxy_tcp_udp_multi_port_handling", "network_state", "proxy_unix_dac_properties", "container_protection_delete", "unix_priv_drop", "pprof_http", "proxy_haproxy_protocol", "network_hwaddr", "proxy_nat", "network_nat_order", "container_full", "candid_authentication", "backup_compression", "candid_config", "nvidia_runtime_config", "storage_api_volume_snapshots", "storage_unmapped", "projects", "candid_config_key", "network_vxlan_ttl", "container_incremental_copy", "usb_optional_vendorid", "snapshot_scheduling", "container_copy_project", "clustering_server_address", "clustering_image_replication", "container_protection_shift" ], "api_status": "stable", "api_version": "1.0", "auth": "trusted", "public": false, "auth_methods": [ "tls" ], "environment": { "addresses": [], "architectures": [ "x86_64", "i686" ], "certificate": "-----BEGIN CERTIFICATE-----\n.../A=\n-----END CERTIFICATE-----\n", "certificate_fingerprint": "", "driver": "lxc", "driver_version": "3.1.0", "kernel": "Linux", "kernel_architecture": "x86_64", "kernel_version": "4.20.5-arch1-1-ARCH", "server": "lxd", "server_pid": 1422, "server_version": "3.8", "storage": "lvm", "storage_version": "2.02.183(2) (2018-12-07) / 1.02.154 (2018-12-07) / 4.39.0 / ./configure --prefix=/usr --sbindir=/usr/bin --sysconfdir=/etc --localstatedir=/var --enable-applib --enable-cmdlib --enable-dmeventd --enable-lvmetad --enable-lvmpolld --enable-pkgconfig --enable-readline --enable-udev_rules --enable-udev_sync --enable-use-lvmetad --with-cache=internal --with-default-dm-run-dir=/run --with-default-locking-dir=/run/lock/lvm --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-systemdsystemunitdir=/usr/lib/systemd/system --with-thin=internal --with-udev-prefix=/usr --enable-udev-systemd-background-jobs", "server_clustered": false, "server_name": "nas.maison.lan", "project": "default" } } Creating test1 DBUG[01-31|23:15:14] Connecting to a remote simplestreams server DBUG[01-31|23:15:14] Connected to the websocket DBUG[01-31|23:15:14] Sending request to LXD method=POST url=http://unix.socket/1.0/containers etag= DBUG[01-31|23:15:14] { "architecture": "", "config": { "security.nesting": "true" }, "devices": {}, "ephemeral": false, "profiles": null, "stateful": false, "description": "", "name": "test1", "source": { "type": "image", "certificate": "", "alias": "xenial", "server": "https://cloud-images.ubuntu.com/releases", "protocol": "simplestreams", "mode": "pull" }, "instance_type": "" } DBUG[01-31|23:15:14] Got operation from LXD DBUG[01-31|23:15:14] { "id": "26181d2c-9f67-43c5-9e1f-0b2cfc4a2c78", "class": "task", "description": "Creating container", "created_at": "2019-01-31T23:15:14.375383514-05:00", "updated_at": "2019-01-31T23:15:14.375383514-05:00", "status": "Running", "status_code": 103, "resources": { "containers": [ "/1.0/containers/test1" ] }, "metadata": null, "may_cancel": false, "err": "" } DBUG[01-31|23:15:14] Sending request to LXD method=GET url=http://unix.socket/1.0/operations/26181d2c-9f67-43c5-9e1f-0b2cfc4a2c78 etag= DBUG[01-31|23:15:14] Got response struct from LXD DBUG[01-31|23:15:14] { "id": "26181d2c-9f67-43c5-9e1f-0b2cfc4a2c78", "class": "task", "description": "Creating container", "created_at": "2019-01-31T23:15:14.375383514-05:00", "updated_at": "2019-01-31T23:15:14.375383514-05:00", "status": "Running", "status_code": 103, "resources": { "containers": [ "/1.0/containers/test1" ] }, "metadata": null, "may_cancel": false, "err": "" } Error: Failed container creation: Create container from image: Image create: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/LVM/images/b579861d72c87f006a28ba6031e3fbee0688c1f26302d86ae8db08cf2d0b63f7/rootfs -n /var/lib/lxd/images/b579861d72c87f006a28ba6031e3fbee0688c1f26302d86ae8db08cf2d0b63f7.rootfs: FATAL ERROR:Data queue size is too large. FATAL ERROR:Data queue size is too large


 - [X] Output of the daemon with --debug (alternatively output of `lxc monitor` while reproducing the issue)
The output above from the command journalctl -u lxd should be sufficent. If not, please let me know.

Please advise and thanks in advance for your help.
falstaff1288 commented 5 years ago

Had no issue manually creating a thin LV in LXDThinPool although I get some warnings but the thin data pool usage is below 8% and around 2% for the metadata pool

$ sudo lvcreate -n ThinLVTest -V 1G --thinpool vgssd1/LXDThinPool
  WARNING: Sum of all thin volume sizes (181,00 GiB) exceeds the size of thin pool vgssd1/LXDThinPool and the size of whole volume group (<100,00 GiB).
  WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
  Logical volume "ThinLVTest" created.

$ lsblk /dev/sdb2
NAME                                                                                 MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sdb2                                                                                   8:18   0  100G  0 part 
├─vgssd1-LXDThinPool_tmeta                                                           254:4    0    1G  0 lvm  
│ └─vgssd1-LXDThinPool-tpool                                                         254:6    0   98G  0 lvm  
│   ├─vgssd1-LXDThinPool                                                             254:7    0   98G  0 lvm  
│   ├─vgssd1-containers_gitea                                                        254:8    0   10G  0 lvm  /var/lib/lxd/storage-pools/LVM/containers/gitea
│   ├─vgssd1-containers_bind--snap1                                                  254:9    0   10G  1 lvm  
│   ├─vgssd1-containers_gitea--snap0                                                 254:10   0   10G  1 lvm  
│   ├─vgssd1-containers_gitea--snap1                                                 254:11   0   10G  1 lvm  
│   ├─vgssd1-containers_bind                                                         254:12   0   10G  0 lvm  /var/lib/lxd/storage-pools/LVM/containers/bind
│   ├─vgssd1-containers_gitea--snap2                                                 254:13   0   10G  1 lvm  
│   ├─vgssd1-containers_bind--snap0                                                  254:14   0   10G  1 lvm  
│   ├─vgssd1-containers_docker                                                       254:15   0   10G  0 lvm  /var/lib/lxd/storage-pools/LVM/containers/docker
│   ├─vgssd1-containers_gitea--snap3                                                 254:16   0   10G  1 lvm  
│   ├─vgssd1-containers_docker--snap0                                                254:17   0   10G  1 lvm  
│   ├─vgssd1-containers_docker--snap1                                                254:18   0   10G  1 lvm  
│   ├─vgssd1-containers_ansible----test                                              254:19   0   10G  0 lvm  /var/lib/lxd/storage-pools/LVM/containers/ansible-test
│   ├─vgssd1-images_49fbcd263660ebe9026fb3cc9ae6dba461814bc257401efe87adb19a6d37ffeb 254:20   0   10G  0 lvm  
│   ├─vgssd1-containers_prometheus                                                   254:21   0   10G  0 lvm  
│   ├─vgssd1-images_3862bc84db95e069c6938d879b6ab0d00854265bf231fc85ef220f27d847c6b6 254:22   0   10G  0 lvm  
│   ├─vgssd1-images_dd2dcba03b23a9ceac220e49c972fe4270cabe1a30eefec5d8387d7ee2e87c12 254:23   0   10G  0 lvm  
│   ├─vgssd1-images_091ea27f573bf334d4e65305e6a5acdeeccfbbc38ce77afb439d5a160fd9d5fc 254:24   0   10G  0 lvm  
│   ├─vgssd1-images_bfb2c59415c520100db4b6231fe3386ee84f2c98c5d1bda00f734226084e834a 254:25   0   10G  0 lvm  
│   └─vgssd1-ThinLVTest                                                              254:31   0    1G  0 lvm  
└─vgssd1-LXDThinPool_tdata                                                           254:5    0   98G  0 lvm  
  └─vgssd1-LXDThinPool-tpool                                                         254:6    0   98G  0 lvm  
    ├─vgssd1-LXDThinPool                                                             254:7    0   98G  0 lvm  
    ├─vgssd1-containers_gitea                                                        254:8    0   10G  0 lvm  /var/lib/lxd/storage-pools/LVM/containers/gitea
    ├─vgssd1-containers_bind--snap1                                                  254:9    0   10G  1 lvm  
    ├─vgssd1-containers_gitea--snap0                                                 254:10   0   10G  1 lvm  
    ├─vgssd1-containers_gitea--snap1                                                 254:11   0   10G  1 lvm  
    ├─vgssd1-containers_bind                                                         254:12   0   10G  0 lvm  /var/lib/lxd/storage-pools/LVM/containers/bind
    ├─vgssd1-containers_gitea--snap2                                                 254:13   0   10G  1 lvm  
    ├─vgssd1-containers_bind--snap0                                                  254:14   0   10G  1 lvm  
    ├─vgssd1-containers_docker                                                       254:15   0   10G  0 lvm  /var/lib/lxd/storage-pools/LVM/containers/docker
    ├─vgssd1-containers_gitea--snap3                                                 254:16   0   10G  1 lvm  
    ├─vgssd1-containers_docker--snap0                                                254:17   0   10G  1 lvm  
    ├─vgssd1-containers_docker--snap1                                                254:18   0   10G  1 lvm  
    ├─vgssd1-containers_ansible----test                                              254:19   0   10G  0 lvm  /var/lib/lxd/storage-pools/LVM/containers/ansible-test
    ├─vgssd1-images_49fbcd263660ebe9026fb3cc9ae6dba461814bc257401efe87adb19a6d37ffeb 254:20   0   10G  0 lvm  
    ├─vgssd1-containers_prometheus                                                   254:21   0   10G  0 lvm  
    ├─vgssd1-images_3862bc84db95e069c6938d879b6ab0d00854265bf231fc85ef220f27d847c6b6 254:22   0   10G  0 lvm  
    ├─vgssd1-images_dd2dcba03b23a9ceac220e49c972fe4270cabe1a30eefec5d8387d7ee2e87c12 254:23   0   10G  0 lvm  
    ├─vgssd1-images_091ea27f573bf334d4e65305e6a5acdeeccfbbc38ce77afb439d5a160fd9d5fc 254:24   0   10G  0 lvm  
    ├─vgssd1-images_bfb2c59415c520100db4b6231fe3386ee84f2c98c5d1bda00f734226084e834a 254:25   0   10G  0 lvm  
    └─vgssd1-ThinLVTest                                                              254:31   0    1G  0 lvm  

$ sudo lvs
  LV                                                                      VG        Attr       LSize   Pool        Origin                                                                  Data%  Meta%  Move Log Cpy%Sync Convert
[...]
  LXDThinPool                                                             vgssd1    twi-aotz-- <98,00g                                                                                     7,52   2,04                            
  ThinLVTest                                                              vgssd1    Vwi-a-tz--   1,00g LXDThinPool                                                                         0,00                                   
  containers_ansible--test                                                vgssd1    Vwi-aotz--  10,00g LXDThinPool                                                                         11,13                                  
  containers_bind                                                         vgssd1    Vwi-aotz--  10,00g LXDThinPool containers_bind-snap1                                                   2,72                                   
  containers_bind-snap0                                                   vgssd1    Vri-a-tz--  10,00g LXDThinPool                                                                         2,72                                   
  containers_bind-snap1                                                   vgssd1    Vri-a-tz--  10,00g LXDThinPool                                                                         2,72                                   
  containers_docker                                                       vgssd1    Vwi-aotz--  10,00g LXDThinPool                                                                         13,85                                  
  containers_docker-snap0                                                 vgssd1    Vri-a-tz--  10,00g LXDThinPool containers_docker                                                       13,59                                  
  containers_docker-snap1                                                 vgssd1    Vri-a-tz--  10,00g LXDThinPool containers_docker                                                       13,63                                  
  containers_gitea                                                        vgssd1    Vwi-aotz--  10,00g LXDThinPool                                                                         9,45                                   
  containers_gitea-snap0                                                  vgssd1    Vri-a-tz--  10,00g LXDThinPool containers_gitea                                                        9,38                                   
  containers_gitea-snap1                                                  vgssd1    Vri-a-tz--  10,00g LXDThinPool containers_gitea                                                        9,62                                   
  containers_gitea-snap2                                                  vgssd1    Vri-a-tz--  10,00g LXDThinPool containers_gitea                                                        9,43                                   
  containers_gitea-snap3                                                  vgssd1    Vri-a-tz--  10,00g LXDThinPool containers_gitea                                                        9,44                                   
  containers_prometheus                                                   vgssd1    Vwi-a-tz--  10,00g LXDThinPool images_49fbcd263660ebe9026fb3cc9ae6dba461814bc257401efe87adb19a6d37ffeb 16,00                                  
  images_091ea27f573bf334d4e65305e6a5acdeeccfbbc38ce77afb439d5a160fd9d5fc vgssd1    Vwi-a-tz--  10,00g LXDThinPool                                                                         2,31                                   
  images_3862bc84db95e069c6938d879b6ab0d00854265bf231fc85ef220f27d847c6b6 vgssd1    Vwi-a-tz--  10,00g LXDThinPool                                                                         6,43                                   
  images_49fbcd263660ebe9026fb3cc9ae6dba461814bc257401efe87adb19a6d37ffeb vgssd1    Vwi-a-tz--  10,00g LXDThinPool                                                                         12,50                                  
  images_bfb2c59415c520100db4b6231fe3386ee84f2c98c5d1bda00f734226084e834a vgssd1    Vwi-a-tz--  10,00g LXDThinPool                                                                         8,80                                   
  images_dd2dcba03b23a9ceac220e49c972fe4270cabe1a30eefec5d8387d7ee2e87c12 vgssd1    Vwi-a-tz--  10,00g LXDThinPool                                                                         6,22                           
stgraber commented 5 years ago

When manually unpacking, were you unpacking on a thin LV of 10GB on the same VG as LXD?

There's really not much else to what we do to unpack squashfs-es, so it sounds like LVM somehow causing write failures somehow.

falstaff1288 commented 5 years ago

Just tested unpacking an image in a thin LV created on the same VG as LXD. It worked without any issues.

$ lvcreate -n ThinLVTest1 -V 10G --thinpool vgssd1/LXDThinPool
[sudo] Mot de passe de admin : 
  WARNING: Sum of all thin volume sizes (180,00 GiB) exceeds the size of thin pool vgssd1/LXDThinPool and the size of whole volume group (<100,00 GiB).
  WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
  Logical volume "ThinLVTest1" created.

$ mkfs.ext4 /dev/vgssd1/ThinLVTest1 
mke2fs 1.44.5 (15-Dec-2018)
Rejet des blocs de périphérique : complété                        
En train de créer un système de fichiers avec 2621440 4k blocs et 655360 i-noeuds.
UUID de système de fichiers=4a9de6d2-7c99-451f-836d-b9b6d8a71bfe
Superblocs de secours stockés sur les blocs : 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocation des tables de groupe : complété                        
Écriture des tables d'i-noeuds : complété                        
Création du journal (16384 blocs) : complété
Écriture des superblocs et de l'information de comptabilité du système de
fichiers : complété

$ mount /dev/mapper/vgssd1-ThinLVTest1 /mnt/

$ unsquashfs -d /mnt/test /var/lib/lxd/images/df7155e205ba18e619d72ec7e8bff9d558d693a94d512cbb6ad56b09955d92ad.rootfs
Parallel unsquashfs: Using 4 processors
9676 inodes (10928 blocks) to write

[=========================================================================================================================================================================-] 10928/10928 100%

created 8186 files
created 944 directories
created 1474 symlinks
created 7 devices
created 0 fifos

$ ls -l /mnt/test/
total 80
drwxr-xr-x  2 root root 4096 27 jun  2018 bin
drwxr-xr-x  2 root root 4096 30 mai  2016 boot
drwxr-xr-x  3 root root 4096 27 jun  2018 dev
drwxr-xr-x 35 root root 4096 27 jun  2018 etc
drwxr-xr-x  2 root root 4096 30 mai  2016 home
drwxr-xr-x  7 root root 4096 22 jun  2012 lib
drwxr-xr-x  2 root root 4096 27 jun  2018 lib64
drwxr-xr-x  2 root root 4096 27 jun  2018 media
drwxr-xr-x  2 root root 4096 30 mai  2016 mnt
drwxr-xr-x  2 root root 4096 27 jun  2018 opt
drwxr-xr-x  2 root root 4096 30 mai  2016 proc
drwx------  2 root root 4096 27 jun  2018 root
drwxr-xr-x  2 root root 4096 27 jun  2018 run
drwxr-xr-x  2 root root 4096 27 jun  2018 sbin
drwxr-xr-x  2 root root 4096 10 jun  2012 selinux
drwxr-xr-x  2 root root 4096 27 jun  2018 srv
drwxr-xr-x  2 root root 4096 14 jui  2013 sys
drwxrwxrwt  2 root root 4096 27 jun  2018 tmp
drwxr-xr-x 10 root root 4096 27 jun  2018 usr
drwxr-xr-x 11 root root 4096 27 jun  2018 var

$ [root@nas mnt]# lvs
  LV                                                                      VG        Attr       LSize   Pool        Origin                Data%  Meta%  Move Log Cpy%Sync Convert
[...]                                                                         
  LXDThinPool                                                             vgssd1    twi-aotz-- <98,00g                                   7,37   2,03                            
  ThinLVTest1                                                             vgssd1    Vwi-aotz--  10,00g LXDThinPool                       2,24

Will keep investigating on my end.

stgraber commented 5 years ago

That's pretty odd as that's effectively what LXD would do itself... I wonder what ends up being different then.

falstaff1288 commented 5 years ago

Additional verbose from journalctl when attempting to create a image that I didn't shared in my previous posts :

fév 02 14:41:20 nas kernel: EXT4-fs (dm-12): mounted filesystem with ordered data mode. Opts: discard
fév 02 14:41:20 nas systemd[1120]: var-lib-lxd-storage\x2dpools-LVM-images-d7838712001c535542b0a51a2721f797cd998eb25d4d3de4b2f23ffefa34c219.mount: Succeeded.
fév 02 14:41:20 nas systemd[1]: var-lib-lxd-storage\x2dpools-LVM-images-d7838712001c535542b0a51a2721f797cd998eb25d4d3de4b2f23ffefa34c219.mount: Succeeded.
fév 02 14:41:21 nas.maison.lan lxd[1050]: t=2019-02-02T14:41:21-0500 lvl=eror msg="Failed to create LVM storage volume for container \"test\" on storage pool \"LVM\": Image create: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/LVM/images/d7838712001c535542b0a51a2721f797cd998eb25d4d3de4b2f23ffefa34c219/rootfs -n /var/lib/lxd/images/d7838712001c535542b0a51a2721f797cd998eb25d4d3de4b2f23ffefa34c219.rootfs: FATAL ERROR:Data queue size is too large.  FATAL ERROR:Data queue size is too large"
falstaff1288 commented 5 years ago

Might have found a lead. With unsquash, the maximum queue size set by default is 256MB :

$ unsquashfs 
SYNTAX: unsquashfs [options] filesystem [directories or files to extract]
    -v[ersion]      print version, licence and copyright information
[...]
    -da[ta-queue] <size>    Set data queue to <size> Mbytes.  Default 256
                Mbytes 

The queue size for all the dm block devices in my system is 128MB :

$ cat /sys/block/dm-*/queue/nr_requests
128
128
128
128
128
128
128
128
128
128
128
128
128
128
128
128
128
128
128

Has the max queue depth value has changed for the I/O schedule since the last system updates? I'm not sure.

Right now, I am looking for a solution to set the value for dm block devices to 256MB at boot time and see if this resolves the issue. I'm also wondering if there is a way to change the arguments used by LXD when executing the unsquash command. Any advise on the latter option would be welcomed.

Refs: https://www.redhat.com/archives/linux-lvm/2004-February/msg00115.html http://yoshinorimatsunobu.blogspot.com/2009/04/linux-io-scheduler-queue-size-and.html

falstaff1288 commented 5 years ago
stgraber commented 5 years ago

Do you know if your distro updated squashfs-tools recently? Could there have been a change in there that may explain this change of behavior?

It's still odd that you're only seeing this through LXD, I wonder if maybe systemd is placing some kind of process limits on LXD which are then inherited by unsquashfs, explaining why you're not seeing this from a regular shell with the same arguments.

stgraber commented 5 years ago

Comparing:

May give us some hints as to what's going on.

falstaff1288 commented 5 years ago

Output :

$ pidof lxd
19047

$ cat /proc/19047/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             unlimited            unlimited            processes 
Max open files            1073741816           1073741816           files     
Max locked memory         67108864             67108864             bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       63652                63652                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us  

$ cat /proc/self/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             63652                63652                processes 
Max open files            1024                 524288               files     
Max locked memory         67108864             67108864             bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       63652                63652                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us  
falstaff1288 commented 5 years ago

I don't know how different it is in the back-end to create containers with LXC, but I had no issue creating one :

$ lxc-create -t download -n test1
Setting up the GPG keyring
Downloading the image index

---
DIST    RELEASE ARCH    VARIANT BUILD
---
alpine  3.4 amd64   default 20180627_17:50
alpine  3.4 armhf   default 20180627_17:50
alpine  3.4 i386    default 20180627_17:50
alpine  3.5 amd64   default 20190203_13:01
alpine  3.5 arm64   default 20190203_13:04
alpine  3.5 armhf   default 20190203_13:04
alpine  3.5 i386    default 20190203_13:01
alpine  3.6 amd64   default 20190203_13:01
alpine  3.6 arm64   default 20190202_13:02
alpine  3.6 armhf   default 20190203_13:04
alpine  3.6 i386    default 20190203_13:01
alpine  3.7 amd64   default 20190203_13:01
alpine  3.7 arm64   default 20190203_13:02
alpine  3.7 armhf   default 20190203_13:04
alpine  3.7 i386    default 20190203_13:08
alpine  3.8 amd64   default 20190203_13:01
alpine  3.8 arm64   default 20190203_13:02
alpine  3.8 armhf   default 20190202_13:03
alpine  3.8 i386    default 20190203_13:01
alpine  3.8 ppc64el default 20190203_13:02
alpine  3.8 s390x   default 20190203_13:02
alpine  edge    amd64   default 20190203_13:01
alpine  edge    arm64   default 20190203_13:03
alpine  edge    armhf   default 20190203_13:03
alpine  edge    i386    default 20190203_13:08
alpine  edge    ppc64el default 20190203_13:02
alpine  edge    s390x   default 20190203_13:02
archlinux   current amd64   default 20190203_01:27
centos  6   amd64   default 20190203_08:13
centos  6   i386    default 20190203_08:13
centos  7   amd64   default 20190203_08:13
centos  7   arm64   default 20190203_08:15
centos  7   armhf   default 20190203_08:16
centos  7   i386    default 20190203_08:20
centos  7   ppc64el default 20190203_08:14
debian  buster  amd64   default 20190203_05:25
debian  buster  arm64   default 20190203_05:28
debian  buster  armel   default 20190203_05:28
debian  buster  armhf   default 20190203_05:28
debian  buster  i386    default 20190203_05:25
debian  buster  ppc64el default 20190203_05:26
debian  buster  s390x   default 20190203_05:26
debian  jessie  amd64   default 20190203_05:25
debian  jessie  arm64   default 20180626_05:25
debian  jessie  armel   default 20190203_05:27
debian  jessie  armhf   default 20190203_05:42
debian  jessie  i386    default 20190203_05:32
debian  jessie  powerpc default 20180626_05:25
debian  jessie  ppc64el default 20180626_05:25
debian  jessie  s390x   default 20180626_05:25
debian  sid amd64   default 20190203_05:25
debian  sid arm64   default 20190203_05:25
debian  sid armel   default 20190203_05:28
debian  sid armhf   default 20190203_05:29
debian  sid i386    default 20190203_05:25
debian  sid powerpc default 20180708_05:25
debian  sid ppc64el default 20190203_05:26
debian  sid s390x   default 20190203_05:26
debian  stretch amd64   default 20190203_05:25
debian  stretch arm64   default 20190203_05:29
debian  stretch armel   default 20190203_05:28
debian  stretch armhf   default 20190203_05:28
debian  stretch i386    default 20190203_05:32
debian  stretch ppc64el default 20190203_05:26
debian  stretch s390x   default 20190203_05:26
debian  wheezy  amd64   default 20180627_05:24
debian  wheezy  armel   default 20180627_05:27
debian  wheezy  armhf   default 20180627_05:26
debian  wheezy  i386    default 20180627_05:25
debian  wheezy  powerpc default 20180627_05:25
debian  wheezy  s390x   default 20180627_05:25
fedora  26  amd64   default 20181102_01:27
fedora  26  i386    default 20181102_01:27
fedora  27  amd64   default 20190203_01:27
fedora  27  i386    default 20190203_01:27
fedora  28  amd64   default 20190203_01:27
fedora  28  i386    default 20190203_01:27
fedora  29  amd64   default 20190203_01:27
fedora  29  i386    default 20190203_01:27
gentoo  current amd64   default 20190202_14:12
gentoo  current i386    default 20190202_14:12
opensuse    15.0    amd64   default 20190203_00:53
opensuse    42.3    amd64   default 20190203_00:53
oracle  6   amd64   default 20190203_11:40
oracle  6   i386    default 20190203_11:40
oracle  7   amd64   default 20190203_11:40
plamo   5.x amd64   default 20180816_21:36
plamo   5.x i386    default 20180816_21:36
plamo   6.x amd64   default 20190202_21:36
plamo   6.x i386    default 20190202_21:36
plamo   7.x amd64   default 20190202_21:36
sabayon current amd64   default 20190203_07:38
ubuntu  bionic  amd64   default 20190203_08:14
ubuntu  bionic  arm64   default 20190203_08:16
ubuntu  bionic  armhf   default 20190203_08:16
ubuntu  bionic  i386    default 20190203_08:14
ubuntu  bionic  ppc64el default 20190203_08:14
ubuntu  bionic  s390x   default 20190203_08:14
ubuntu  cosmic  amd64   default 20190203_08:13
ubuntu  cosmic  arm64   default 20190203_08:16
ubuntu  cosmic  armhf   default 20190203_08:16
ubuntu  cosmic  i386    default 20190203_08:22
ubuntu  cosmic  ppc64el default 20190203_08:15
ubuntu  cosmic  s390x   default 20190203_08:14
ubuntu  disco   amd64   default 20190203_08:14
ubuntu  disco   arm64   default 20190203_08:14
ubuntu  disco   armhf   default 20190203_08:16
ubuntu  disco   i386    default 20190203_08:13
ubuntu  disco   ppc64el default 20190203_08:14
ubuntu  disco   s390x   default 20190203_08:14
ubuntu  trusty  amd64   default 20190203_08:13
ubuntu  trusty  arm64   default 20190203_08:14
ubuntu  trusty  armhf   default 20190203_08:16
ubuntu  trusty  i386    default 20190203_08:22
ubuntu  trusty  powerpc default 20180824_07:43
ubuntu  trusty  ppc64el default 20190203_08:15
ubuntu  xenial  amd64   default 20190203_08:14
ubuntu  xenial  arm64   default 20190203_08:14
ubuntu  xenial  armhf   default 20190203_08:16
ubuntu  xenial  i386    default 20190203_08:14
ubuntu  xenial  powerpc default 20180824_07:44
ubuntu  xenial  ppc64el default 20190203_08:14
ubuntu  xenial  s390x   default 20190203_08:14
---

Distribution: 
archlinux 
Release: 
current
Architecture: 
amd64

Using image from local cache
Unpacking the rootfs

---
You just created an ArchLinux container (release=current, arch=amd64, variant=default)

For security reason, container images ship without user accounts
and without a root password.

Use lxc-attach or chroot directly into the rootfs to set a root password
or create user accounts.

$ lxc-start -n test1 -F
systemd 240 running in system mode. (+PAM +AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to Arch Linux!

Set hostname to <test1>.
/usr/lib/systemd/system/auditd.service:12: PIDFile= references path below legacy directory /var/run/, updating /var/run/auditd.pid → /run/auditd.pid; please update the unit file accordingly.
[  OK  ] Listening on Device-mapper event daemon FIFOs.
[  OK  ] Listening on Journal Audit Socket.
[  OK  ] Listening on LVM2 metadata daemon socket.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Created slice system-getty.slice.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Created slice system-container\x2dgetty.slice.
[  OK  ] Reached target Swap.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Reached target Paths.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Reached target Slices.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Listening on Process Core Dump Socket.
[  OK  ] Listening on Journal Socket.
         Mounting Temporary Directory (/tmp)...
         Mounting Huge Pages File System...
         Starting Apply Kernel Variables...
         Mounting POSIX Message Queue File System...
         Starting Journal Service...
         Starting Remount Root and Kernel File Systems...
         Starting Monitoring of LVM2 mirrors, snaps…tc. using dmeventd or progress polling...
[  OK  ] Listening on Network Service Netlink Socket.
[  OK  ] Listening on LVM2 poll daemon socket.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[  OK  ] Mounted Temporary Directory (/tmp).
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Started Remount Root and Kernel File Systems.
         Starting Create System Users...
[  OK  ] Started LVM2 metadata daemon.
[  OK  ] Started Journal Service.
         Starting Flush Journal to Persistent Storage...
[  OK  ] Started Create System Users.
         Starting Network Service...
[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Started Network Service.
[  OK  ] Started Monitoring of LVM2 mirrors, snapsh… etc. using dmeventd or progress polling.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
         Starting Rebuild Journal Catalog...
         Starting Create Volatile Files and Directories...
         Starting Rebuild Dynamic Linker Cache...
[  OK  ] Started Rebuild Journal Catalog.
[  OK  ] Started Create Volatile Files and Directories.
         Starting Update UTMP about System Boot/Shutdown...
         Starting Network Name Resolution...
[  OK  ] Started Rebuild Dynamic Linker Cache.
         Starting Update is Completed...
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Started Update is Completed.
[  OK  ] Reached target System Initialization.
[  OK  ] Started Daily rotation of log files.
[  OK  ] Started Daily verification of password and group files.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timers.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
         Starting Login Service...
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started Login Service.
[  OK  ] Started Network Name Resolution.
[  OK  ] Reached target Host and Network Name Lookups.
[  OK  ] Reached target Network.
         Starting Permit User Sessions...
[  OK  ] Started Permit User Sessions.
[  OK  ] Started Getty on lxc/tty4.
[  OK  ] Started Container Getty on /dev/pts/1.
[  OK  ] Started Getty on lxc/tty6.
[  OK  ] Started Getty on lxc/tty3.
[  OK  ] Started Getty on lxc/tty2.
[  OK  ] Started Container Getty on /dev/pts/2.
[  OK  ] Started Container Getty on /dev/pts/3.
[  OK  ] Started Getty on tty1.
[  OK  ] Started Getty on lxc/tty1.
[  OK  ] Started Console Getty.
[  OK  ] Started Container Getty on /dev/pts/0.
[  OK  ] Started Getty on lxc/tty5.
[  OK  ] Reached target Login Prompts.
[  OK  ] Reached target Multi-User System.

Arch Linux 4.19.19-1-lts (console)

test1 login: 
stgraber commented 5 years ago

LXC doesn't use squashfs, so the creation process is pretty different.

stgraber commented 5 years ago

I don't suppose you can setup such a system in a way that we could access it to take a look?

I'm kinda running out of ideas at this point, so being able to reproduce it reliably or having access to an affected system would help quite a bit.

falstaff1288 commented 5 years ago
falstaff1288 commented 5 years ago

To answer a question you've asked earlier, here the history for the squash packages in the server :

$ cat /var/log/pacman.log | grep squas 
[2018-06-07 15:32] [PACMAN] Running 'pacman --noconfirm --asdeps -S -- lxc squashfs-tools'
[2018-06-07 15:33] [PACMAN] Running 'pacman --noconfirm --asdeps -S -- lxc squashfs-tools'
[2018-06-07 15:33] [ALPM] installed squashfs-tools (4.3-5)
[2018-10-04 17:17] [PACMAN] Running 'pacman -S squashfs'
[2018-10-04 17:21] [PACMAN] Running 'pacman -U --noconfirm --config /etc/pacman.conf -- /home/admin/.cache/yay/squashfuse-git/squashfuse-git-340.0b48352-1-x86_64.pkg.tar.xz'
[2018-10-04 17:21] [ALPM] installed squashfuse-git (340.0b48352-1)
[2018-12-25 09:33] [ALPM] upgraded squashfs-tools (4.3-5 -> 4.3-8)
[2019-02-03 00:24] [PACMAN] Running 'pacman -U --noconfirm --config /etc/pacman.conf -- /home/admin/.cache/yay/squashfuse-git/squashfuse-git-340.0b48352-1-x86_64.pkg.tar.xz'
[2019-02-03 00:24] [ALPM] reinstalled squashfuse-git (340.0b48352-1)
[2019-02-03 09:23] [PACMAN] Running 'pacman -U --config /etc/pacman.conf -- /home/admin/.cache/yay/squashfs-tools-git/squashfs-tools-git-1808.6e242dc-1-x86_64.pkg.tar.xz'
[2019-02-03 09:23] [ALPM] removed squashfs-tools (4.3-8)
[2019-02-03 09:23] [ALPM] installed squashfs-tools-git (1808.6e242dc-1)
[2019-02-03 09:38] [PACMAN] Running 'pacman -U http://ftp5.gwdg.de/pub/linux/archlinux/community/os/x86_64//squashfs-tools-4.3-8-x86_64.pkg.tar.xz'
[2019-02-03 09:38] [ALPM] removed squashfs-tools-git (1808.6e242dc-1)
[2019-02-03 09:38] [ALPM] installed squashfs-tools (4.3-8)
stgraber commented 5 years ago

@monstermunchkin is this something you can try on your Arch system easily enough? Basically creating a LVM storage pool and trying to throw an ubuntu:xenial container at it, see if that works. Also make sure you have squashfs-tools installed as LXD would fallback to tar.xz if it's not available.

monstermunchkin commented 5 years ago

I cannot confirm this error on my Arch system. Creating a Xenial container works fine for me.

stgraber commented 5 years ago

@monstermunchkin on a LVM storage pool too?

falstaff1288 commented 5 years ago

I installed LXD from the AUR on my laptop running an up-to-date ArchLinux (ext4/dir backend only) and everything works fine.

The question for me is how to isolate the issue on my server. Again, I get the same issue whether I use LVM or dir as a storage backend which is weird.

monstermunchkin commented 5 years ago

@stgraber Yes, with LVM.

stgraber commented 5 years ago

@falstaff1288 can you confirm that your two systems are running the same version of squashfs-tools and kernel? You could compare LXD and LXC versions too, but that really shouldn't matter... Might be worth comparing systemd version too maybe?

falstaff1288 commented 5 years ago

$ lxc list +------+-------+------+------+------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +------+-------+------+------+------+-----------+

$ lxc storage show default config: source: /var/lib/lxd/storage-pools/default volatile.initial_source: /var/lib/lxd/storage-pools/default description: "" name: default driver: btrfs used_by:

$ lxc image list +-------+--------------+--------+-----------------------------------+--------+----------+-----------------------------+ | ALIAS | FINGERPRINT | PUBLIC | DESCRIPTION | ARCH | SIZE | UPLOAD DATE | +-------+--------------+--------+-----------------------------------+--------+----------+-----------------------------+ | | 5331a5e28e6b | no | Alpine 3.8 amd64 (20190204_13:01) | x86_64 | 2.34MB | Feb 5, 2019 at 1:06pm (UTC) | +-------+--------------+--------+-----------------------------------+--------+----------+-----------------------------+ | | ee33f784ee1f | no | Centos 7 amd64 (20190205_07:09) | x86_64 | 124.86MB | Feb 5, 2019 at 1:11pm (UTC) | +-------+--------------+--------+-----------------------------------+--------+----------+-----------------------------+

$ btrfs filesystem usage /var/lib/lxd Overall: Device size: 100.00GiB Device allocated: 2.02GiB Device unallocated: 97.98GiB Device missing: 0.00B Used: 130.25MiB Free (estimated): 98.86GiB (min: 98.86GiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 16.00MiB (used: 0.00B)

Data,single: Size:1.01GiB, Used:129.88MiB /dev/sdb2 1.01GiB

Metadata,single: Size:1.01GiB, Used:368.00KiB /dev/sdb2 1.01GiB

System,single: Size:4.00MiB, Used:16.00KiB /dev/sdb2 4.00MiB

Unallocated: /dev/sdb2 97.98GiB

$ btrfs subvolume show /var/lib/lxd / Name: UUID: 39609623-c2c9-4d3d-8922-e574c5355cca Parent UUID: - Received UUID: - Creation time: 2019-02-05 08:04:18 -0500 Subvolume ID: 5 Generation: 18 Gen at creation: 0 Parent ID: 0 Top level ID: 0 Flags: - Snapshot(s):

$ btrfs subvolume list /var/lib/lxd ID 257 gen 13 top level 5 path storage-pools/default ID 258 gen 9 top level 257 path storage-pools/default/containers ID 259 gen 10 top level 257 path storage-pools/default/containers-snapshots ID 260 gen 18 top level 257 path storage-pools/default/images ID 261 gen 12 top level 257 path storage-pools/default/custom ID 262 gen 13 top level 257 path storage-pools/default/custom-snapshots


- Yesterday, I gave incomplete test results regarding trying LXD on my laptop. It worked but didn't used LVM as a storage back-end; it's a bare ext4 partition with dir. Just wanted to precise this detail although it is kind of irrelevant as I am getting the same error on my server no matter which backend is used. 

$ lsblk -f NAME FSTYPE LABEL UUID FSAVAIL FSUSE% MOUNTPOINT sda
├─sda1 ext2 d01fe79c-189e-4003-9e47-a1ba2bffbd96 424M 11% /boot ├─sda2 ext4 ddb8edcc-cd67-4944-96cd-74a64ce70932 30.4G 33% / ├─sda3 ext4 99d4186b-6aba-4dcb-9e0c-52fe5910d164 132.8G 4% /home └─sda4

$ sudo lxc storage show default config: source: /var/lib/lxd/storage-pools/default description: "" name: default driver: dir used_by:



I'll continue with further research in hope to isolate the issue.
timlepes commented 5 years ago

I am getting the same unsquashfs errors using the lxc create command. I'm running Arch with the 4.20.6-zen1-1-zen kernel. Both lxc and lxd are at version 3.9, storage is btrfs at version 4.19.1. Let me know if you want any diagnostics from me as well.

I first noticed problems yesterday or the day before when I went to delete several containers I had made a few weeks ago when playing around with the networking options. The lxbr0 containers deleted fine, but the macvlan ones took FOREVER to remove... I actually thought the command hung so I killed the terminal after a while, but was able to see that the containers had actually been removed (using lxc list). So after deleteing the half dozen containers I no longer needed, I went to create a new container and got the error message; searching the net landed me here.

timlepes commented 5 years ago

Here is some basic info in case it is useful. Let me know if more is needed.

[tlepes@mothership ~]$ sudo lxc launch ubuntu:18.04 monica-server [sudo] password for tlepes: Creating monica-server Error: Failed container creation: Create container from image: Failed to create image volume: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/default/images/b7c4dbea897f09f29474c8597c511b57c3b9c0d6f98dc42f257c64e76fea8c92_tmp/rootfs -n /var/lib/lxd/images/b7c4dbea897f09f29474c8597c511b57c3b9c0d6f98dc42f257c64e76fea8c92.rootfs: FATAL ERROR:Data queue size is too large. FATAL ERROR:Data queue size is too large [tlepes@mothership ~]$ lxc info config: core.https_address: '[::]:8443' core.trust_password: true api_extensions:

stgraber commented 5 years ago

@timlepes @falstaff1288 the best bet for us to figure this out at this point is to either have reliable reproduction steps on a clean ArchLinux install, or provide us with a VM image that's got the issue happening inside it or provide us with access to an affected system.

The only hits for this error are this issue and the squashfs source code and looking at the code, it's pretty unclear what's going on, so we need to be able to reproduce this issue.

falstaff1288 commented 5 years ago

I'll try to reproduce the issue in a VM.

stgraber commented 5 years ago

@falstaff1288 any luck with that?

falstaff1288 commented 5 years ago

I've been trying to reproduce my infrastructure in a VM through KVM using a LVM + macvlan, since this is the setup @timlepes has as well. I figured out for the LVM part, still working on the macvlan one.

Will keep you posted with more details.

falstaff1288 commented 5 years ago

I have not been able to reproduce the issue in a VM; my attempt to configure a macvlan bridge have been unsuccessful. I have not been able either to fix the issue on my server either. Running out of ideas.

stgraber commented 5 years ago

Hmm, anyone else running into this and who has an idea of what the issue may be or can give us access to look around and track this down?

timlepes commented 5 years ago

Sorry to be unresponsive; my roommate passed away unexpectedly so the past couple weeks have been upside-down. I should have some time this weekend to spend on this. I'll try reconfiguring on my system since there have been package updates to lxc/lxd. If that still doesn't work I'll try reproducing in a VM for you to examine. Thanks.

Cheers,

Timothy LePés

On Thu, Feb 21, 2019 at 6:45 PM Stéphane Graber notifications@github.com wrote:

Hmm, anyone else running into this and who has an idea of what the issue may be or can give us access to look around and track this down?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lxc/lxd/issues/5449#issuecomment-466252997, or mute the thread https://github.com/notifications/unsubscribe-auth/AKfi4Vsw3T1lT0riumf8WzVJsaaeZeP0ks5vP1nJgaJpZM4adolY .

blair-0 commented 5 years ago

same problem was accrued after upgrade archlinux, I check the squashfs-tools code and found some information, dose this useful for you to troubleshoot?

if (max_files != -1) { if(add_overflow(data_buffer_size, max_files) || add_overflow(data_buffer_size, max_files * 2)) EXIT_UNSQUASH("Data queue size is too large\n");

if(shift_overflow(data_buffer_size, 20 - block_log)) EXIT_UNSQUASH("Data queue size is too large\n"); else data_buffer_size <<= 20 - block_log;

url: https://sourceforge.net/p/squashfs/code/ci/master/tree/squashfs-tools/unsquashfs.c#l2749

stgraber commented 5 years ago

Not really, especially doesn't explain why running the same command outside of LXD would succeed...

stgraber commented 5 years ago

I did read that part of the squashfs code but since it doesn't look like squashfs was changed just before this issue started, it seems unlikely to be the source.

github-usr-name commented 5 years ago

Just encountered the same:

root@cygnus:~# lxc launch images:alpine/3.9 Creating the container Error: Failed container creation: Create container from image: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/default/images/991367651/rootfs -n /var/lib/lxd/images/40dd95add1657a574cc076982ce5f1e0bb3a44522b91eeb1b365460e7cd04f2c.rootfs: FATAL ERROR:Data queue size is too large

System setup is comparable to OP [Arch Linux, similar upgrade date & lxc info output]

monstermunchkin commented 5 years ago

How are you running LXD? Are you using the AUR package? If so, please try running LXD from the command line (sudo lxd --group lxd --debug) , instead of using the systemd service. Does this change anything?

github-usr-name commented 5 years ago

@monstermunchkin Thanks for the help :). Yes, I'm using the AUR package. Interestingly enough, with LXD running through the CLI as above, things are working properly. Output from systemctl cat lxd:


[Unit]
Description=REST API, command line tool and OpenStack integration plugin for LXC.
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/lxd --group lxd
ExecStop=/usr/bin/lxd shutdown
KillMode=process
LimitNOFILE=1048576
LimitNPROC=infinity
TasksMax=infinity

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/lxd.service.d/override.conf
[Service]
LimitNOFILE=infinity
LimitNPROC=infinity
TasksMax=infinity
github-usr-name commented 5 years ago

and... after removing the override, LXD is behaving properly again :)

monstermunchkin commented 5 years ago

@falstaff1288 @timlepes does this solve your problem as well?

falstaff1288 commented 5 years ago

@monstermunchkin I confirm that removing /etc/systemd/system/lxd.service.d/override.conf fixed the issue with unsquashfs. Good find @github-usr-name. So is it the parameter LimitNOFILE=infinity that causes the issue?

github-usr-name commented 5 years ago

@falstaff1288 Based on empirical observation, yes - adding an override with just the LimitNOFILE=infinity caused the issue to re-occur. I suspect an upstream bug in squashfs around line 2315 where it's checking for an overflow; if the value of LimitNOFILE is exactly equal to INT_MAX then it looks as if the behaviour will be triggered.

stgraber commented 5 years ago

Oh, that's interesting, not sure why squashfs would care about any of those prlimit bumps, might be interesting to figure out which one it is and file a bug against squashfs indeed.

Closing as this is clearly no a LXD bug then :)

Wizlonex commented 3 years ago

Hmm, anyone else running into this and who has an idea of what the issue may be or can give us access to look around and track this down?

Hi Stephane:

So I also ran into this issue when copying a container from one machine to the other. And I also got it to work, so I am posting what worked for me and, if of any value, I can post additional information:

Firstly, the container was originally created in 2017 under ubuntu 16.04 (host & container) and LXD version 2.x (the stable version that came with ubuntu 16.04 - sorry, I can't recall exactly but I suspect you can). I have successfully moved my container to different machines as my network hardware has changed. Never a problem until now.

I am now running ubuntu 20.04 and LXD 4.10 on the host machines. I have previosuly copied the the container from my original host machines, now to machines running 20.04.

Recently, I copied my container from one machine to a newer one (both running 20.04) using the --refresh option (because I didn't want to stop it and this option makes it easy). It copied and started without issue and worked as expected. I was going to update the container, so I issued 'lxc snapshot cname' and got this error:

Error: Failed creating instance snapshot record "SysAdmin/snap16": Failed initialising instance: Failed loading storage pool: No such object

I tried stopping the container. Snapshot still did not work. I tried a new fresh copy (again using --refresh) and ended up with the same snapshot error.

So then I stopped my original container briefly and copied without the --refresh option. Lo and behold, snapshot is working.

  1. I posting this information here case others find it useful as/when/if they hit this error. :)
  2. I also have original files, and, with very clear instructions, I can paste additional information here if it can help nail down what is happening, in case it is an LXD issue.

V/R

Andrew