Closed pstch closed 4 years ago
Can you show:
df -h
df -i
zfs list -t all
Ah and lxc storage show default
too
[root@tinix:~]# lxc storage show default
config:
source: tinix-main/LXD
volatile.initial_source: tinix-main/LXD
zfs.pool_name: tinix-main/LXD
description: ""
name: default
driver: zfs
used_by:
- /1.0/profiles/default
status: Created
locations:
- none
[root@tinix:~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 198M 0 198M 0% /dev
tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs 988M 4.1M 984M 1% /run
tmpfs 2.0G 384K 2.0G 1% /run/wrappers
tinix-main/SYS/nixos-1 19G 7.7G 12G 41% /
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/sda2 488M 48M 405M 11% /boot
tmpfs 395M 0 395M 0% /run/user/0
tmpfs 395M 0 395M 0% /run/user/1000
tmpfs 100K 0 100K 0% /var/lib/lxd/shmounts
tmpfs 100K 0 100K 0% /var/lib/lxd/devlxd
[root@tinix:~]# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
devtmpfs 503435 402 503033 1% /dev
tmpfs 505604 1 505603 1% /dev/shm
tmpfs 505604 969 504635 1% /run
tmpfs 505604 35 505569 1% /run/wrappers
tinix-main/SYS/nixos-1 23681654 444630 23237024 2% /
tmpfs 505604 18 505586 1% /sys/fs/cgroup
/dev/sda2 32768 340 32428 2% /boot
tmpfs 505604 4 505600 1% /run/user/0
tmpfs 505604 4 505600 1% /run/user/1000
tmpfs 505604 1 505603 1% /var/lib/lxd/shmounts
tmpfs 505604 2 505602 1% /var/lib/lxd/devlxd
[root@tinix:~]# zfs list -t all
NAME USED AVAIL REFER MOUNTPOINT
tinix-cold 876K 61.5G 96K none
tinix-main 7.69G 11.1G 192K none
tinix-main/LXD 1.31M 11.1G 192K none
tinix-main/LXD/containers 192K 11.1G 192K none
tinix-main/LXD/custom 192K 11.1G 192K none
tinix-main/LXD/custom-snapshots 192K 11.1G 192K none
tinix-main/LXD/deleted 192K 11.1G 192K none
tinix-main/LXD/images 192K 11.1G 192K none
tinix-main/LXD/snapshots 192K 11.1G 192K none
tinix-main/SYS 7.66G 11.1G 192K none
tinix-main/SYS/nixos-1 7.66G 11.1G 7.66G /
Not sure how easy it is for you to rebuild LXD with a custom patch, but if it's easy, then the following should help:
diff --git a/shared/archive_linux.go b/shared/archive_linux.go
index 7bd0ff438..cf6d0a0f4 100644
--- a/shared/archive_linux.go
+++ b/shared/archive_linux.go
@@ -87,9 +87,9 @@ func Unpack(file string, path string, blockBackend bool, runningInUserns bool, t
// Check if we're running out of space
if int64(fs.Bfree) < int64(2*fs.Bsize) {
if blockBackend {
- return fmt.Errorf("Unable to unpack image, run out of disk space (consider increasing your pool's volume.size)")
+ logger.Errorf("Unable to unpack image, run out of disk space (consider increasing your pool's volume.size)")
} else {
- return fmt.Errorf("Unable to unpack image, run out of disk space")
+ logger.Errorf("Unable to unpack image, run out of disk space")
}
}
That will expose the error as it's coming out of the unpacker.
With this patch, the error message I get is:
Error: Failed instance creation: Create container from image: Unpack failed, Failed to run: unsquashfs -f -d /var/lib/lxd/storage-pools/default/images/872160240/rootfs -n /var/lib/lxd/images/4998508e7172a87489f2eb1c1c871cab280b7b96db850f0c486da599631491de.rootfs: FATAL ERROR:write_file: failed to create file /var/lib/lxd/storage-pools/default/images/872160240/rootfs/usr/lib/x86_64-linux-gnu/gconv/EBCDIC-AT-DE.so, because Too many open files.
Here are the current limits used:
[root@tinix:~]# cat /proc/$(pidof lxd)/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 15732 15732 processes
Max open files 1024 524288 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 15732 15732 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
It seems indeed that 3.18 opens more files (~969) than 3.13 (~650), and this can become a problem when the steps in Production Setup have not yet been applied.
I suppose the only thing to really fix is the confusing error message, and maybe users should be notified that the nofile
limit must now be raised even to create a single container (at least in some configurations), and that this issue can otherwise be closed.
This is most likely a squashfs bug though rather than LXD. It sounds like unsquashfs is keeping a lot of files open as it's uncompressing which seems wrong.
I'll take a look at the out of space logic though as it sure shouldn't have triggered when there's plenty of space left...
I can most probably confirm that is is problem with unsqashfs. With squashfs-tools 4.4 I can't create Debian container (have not tested any other). When squashfs-tools were downgraded to 4.3 it works without a hitch.
Required information
Issue description
When using ZFS with LXD 3.18, I cannot create new containers (
Unable to unpack image, run out of disk space
), although I have plenty of disk space available.Steps to reproduce
zfs create rpool/test
)lxd init
, using the created dataset as the new storage poollxc launch debian/9
The same steps work with LXD 3.13.
Information to attach
dmesg
) No relevant infolxc info NAME --show-log
) No container createdlxc config show NAME --expanded
) Default profile untouchedlxc monitor
while reproducing the issue : https://gist.github.com/pstch/9f8835f2d00c3554579069c4e13acf8a