Closed broizter closed 1 year ago
My lxd.log is also completely spammed with "Failed to get disk stats". Not sure if related.
Most probably this error comes from this place in the kernel: https://github.com/torvalds/linux/blob/f10b439638e2482a89a1a402941207f6d8791ff8/fs/btrfs/qgroup.c#L1602 while ioctl is called from this place: https://github.com/kdave/btrfs-progs/blob/441d01556873385d55fd4940f50ee7ae1fcfb13d/cmds/qgroup.c#L1762
Please show the output of btrfs qgroup show -e -f --raw /var/lib/lxd/storage-pools/default/images/f161c60ebcfe5806986bcfef748df7cf23bf7eb39eb5b7c130d0e5aa5371522f
and btrfs subvolume show /var/lib/lxd/storage-pools/default/images/f161c60ebcfe5806986bcfef748df7cf23bf7eb39eb5b7c130d0e5aa5371522f
.
It seems like they don't exist. Very strange.
niklas@tank ~
❯ sudo btrfs qgroup show -e -f --raw /var/lib/lxd/storage-pools/default/images/f161c60ebcfe5806986bcfef748df7cf23bf7eb39eb5b7c130d0e5aa5371522f
ERROR: cannot access '/var/lib/lxd/storage-pools/default/images/f161c60ebcfe5806986bcfef748df7cf23bf7eb39eb5b7c130d0e5aa5371522f': No such file or directory
niklas@tank ~
❯ sudo btrfs subvolume show /var/lib/lxd/storage-pools/default/images/f161c60ebcfe5806986bcfef748df7cf23bf7eb39eb5b7c130d0e5aa5371522f
ERROR: cannot find real path for '/var/lib/lxd/storage-pools/default/images/f161c60ebcfe5806986bcfef748df7cf23bf7eb39eb5b7c130d0e5aa5371522f': No such file or directory
## I tried to make a new container at this point just to double check. Same issue though.
niklas@tank ~
❯ lxc launch images:archlinux testcontainer
Creating testcontainer
Error: Failed instance creation: Failed creating instance from image: Failed to run: btrfs qgroup create 0/105380 /var/lib/lxd/storage-pools/default/images/85e1313aac5ec2fe801377b72e06fb1843421a67e58921f9239bca8c56711537: exit status 1 (ERROR: unable to create quota group: File exists)
niklas@tank ~
❯ sudo btrfs qgroup show -e -f --raw /var/lib/lxd/storage-pools/default/images/85e1313aac5ec2fe801377b72e06fb1843421a67e58921f9239bca8c56711537
ERROR: cannot access '/var/lib/lxd/storage-pools/default/images/85e1313aac5ec2fe801377b72e06fb1843421a67e58921f9239bca8c56711537': No such file or directory
niklas@tank ~
❯ sudo btrfs subvolume show /var/lib/lxd/storage-pools/default/images/85e1313aac5ec2fe801377b72e06fb1843421a67e58921f9239bca8c56711537
ERROR: cannot find real path for '/var/lib/lxd/storage-pools/default/images/85e1313aac5ec2fe801377b72e06fb1843421a67e58921f9239bca8c56711537': No such file or directory
Can you see something in ls -la /var/lib/lxd/storage-pools/default/images
?
It's empty.
niklas@tank ~
❯ sudo ls -la /var/lib/lxd/storage-pools/default/images
total 0
drwx--x--x 1 root root 0 9 dec 12.52 .
drwxr-xr-x 1 root root 214 26 sep 11.25 ..
Ah, so it means that after an error occurs this subvolume was deleted on the error path.
I guess so. I should mention that I have like 10 containers running without an issue though. The difference is that they were created before this error started occuring, which I think happened after upgrading to btrfs-progs 6.0.1.
Hmm, are these containers on the same node? Which storage backend are you using? It's strange that you have empty /var/lib/lxd/storage-pools/default/images
in this case.
Same node. I'm using btrfs storage backend. The duplicate names are snapshots.
niklas@tank ~
❯ lxc storage info default
info:
description: ""
driver: btrfs
name: default
space used: 129.58GiB
total space: 464.69GiB
used by:
instances:
- apps
- apps
- apps
- apps
- apps
- apps
- apps
- apps
- apps
- apps
- apps
- apps
- apps
- apps
- apps
- compiler
- compiler
- compiler
- compiler
- compiler
- compiler
- compiler
- compiler
- compiler
- compiler
- compiler
- compiler
- compiler
- compiler
- compiler
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- minecraft
- mqtt
- mqtt
- mqtt
- mqtt
- mqtt
- mqtt
- mqtt
- mqtt
- mqtt
- mqtt
- mqtt
- mqtt
- mqtt
- mqtt
- mqtt
- plex
- plex
- plex
- plex
- plex
- plex
- plex
- plex
- plex
- plex
- plex
- plex
- plex
- plex
- plex
- samba
- samba
- samba
- samba
- samba
- samba
- samba
- samba
- samba
- samba
- samba
- samba
- samba
- samba
- samba
- webserver
- webserver
- webserver
- webserver
- webserver
- webserver
- webserver
- webserver
- webserver
- webserver
- webserver
- webserver
- webserver
- webserver
- webserver
- youtube
- youtube
- youtube
- youtube
- youtube
- youtube
- youtube
- youtube
- youtube
- youtube
- youtube
- youtube
- youtube
- youtube
- youtube
profiles:
- default
Please show btrfs subvolume list
, cat /proc/1/mountinfo
, lsblk
btrfs subvolume list (i apologize in advance for the amount. each docker container is one subvolume and it gets multiplied in every snapshot)
cat /proc/1/mountinfo
22 29 0:20 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
23 29 0:21 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sys rw
24 29 0:5 / /dev rw,nosuid,relatime shared:2 - devtmpfs dev rw,size=8098836k,nr_inodes=2024709,mode=755,inode64
25 29 0:22 / /run rw,nosuid,nodev,relatime shared:12 - tmpfs run rw,mode=755,inode64
26 23 0:23 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime shared:7 - efivarfs efivarfs rw
29 1 0:25 /ROOT / rw,noatime shared:1 - btrfs /dev/mapper/root rw,compress=zstd:3,ssd,space_cache,user_subvol_rm_allowed,subvolid=256,subvol=/ROOT
27 23 0:6 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:8 - securityfs securityfs rw
28 24 0:24 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs rw,inode64
30 24 0:28 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
31 23 0:29 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:9 - cgroup2 cgroup2 rw
32 23 0:30 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:10 - pstore pstore rw
33 23 0:31 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime shared:11 - bpf bpf rw,mode=700
34 22 0:32 / /proc/sys/fs/binfmt_misc rw,relatime shared:13 - autofs systemd-1 rw,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15926
35 24 0:19 / /dev/mqueue rw,nosuid,nodev,noexec,relatime shared:14 - mqueue mqueue rw
36 24 0:33 / /dev/hugepages rw,relatime shared:15 - hugetlbfs hugetlbfs rw,pagesize=2M
37 23 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:16 - debugfs debugfs rw
38 23 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime shared:17 - tracefs tracefs rw
58 25 0:34 / /run/credentials/systemd-sysctl.service ro,nosuid,nodev,noexec,relatime shared:18 - ramfs ramfs rw,mode=700
39 23 0:35 / /sys/kernel/config rw,nosuid,nodev,noexec,relatime shared:19 - configfs configfs rw
40 23 0:36 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime shared:20 - fusectl fusectl rw
64 25 0:76 / /run/credentials/systemd-sysusers.service ro,nosuid,nodev,noexec,relatime shared:21 - ramfs ramfs rw,mode=700
66 25 0:77 / /run/credentials/systemd-tmpfiles-setup-dev.service ro,nosuid,nodev,noexec,relatime shared:22 - ramfs ramfs rw,mode=700
91 29 0:25 /CACHE /mnt/cache rw,noatime shared:45 - btrfs /dev/mapper/root rw,compress=zstd:3,ssd,space_cache,user_subvol_rm_allowed,subvolid=3479,subvol=/CACHE
88 29 0:79 / /tmp rw,nosuid,nodev shared:47 - tmpfs tmpfs rw,nr_inodes=1048576,inode64
96 29 259:1 / /boot rw,relatime shared:49 - vfat /dev/nvme0n1p1 rw,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro
107 29 0:84 / /mnt/cache1 rw,relatime shared:55 - btrfs /dev/mapper/nvmecache rw,compress=zstd:3,ssd,space_cache=v2,subvolid=5,subvol=/
155 25 0:93 / /run/credentials/systemd-tmpfiles-setup.service ro,nosuid,nodev,noexec,relatime shared:65 - ramfs ramfs rw,mode=700
343 29 0:101 / /var/lib/lxcfs rw,nosuid,nodev,relatime shared:85 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
361 37 0:12 / /sys/kernel/debug/tracing rw,nosuid,nodev,noexec,relatime shared:101 - tracefs tracefs rw
452 29 0:103 / /var/lib/lxd/shmounts rw,relatime shared:222 - tmpfs tmpfs rw,size=100k,mode=711,inode64
463 29 0:104 / /var/lib/lxd/devlxd rw,relatime shared:246 - tmpfs tmpfs rw,size=100k,mode=755,inode64
474 29 0:25 /ROOT/var/lib/lxd/storage-pools/default /var/lib/lxd/storage-pools/default rw,noatime shared:1 - btrfs /dev/mapper/root rw,compress=zstd:3,ssd,space_cache,user_subvol_rm_allowed,subvolid=295,subvol=/ROOT/var/lib/lxd/storage-pools/default
531 34 0:125 / /proc/sys/fs/binfmt_misc rw,nosuid,nodev,noexec,relatime shared:257 - binfmt_misc binfmt_misc rw
1645 25 0:958 / /run/user/1000 rw,nosuid,nodev,relatime shared:223 - tmpfs tmpfs rw,size=1622528k,nr_inodes=405632,mode=700,uid=1000,gid=1000,inode64
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 465.8G 0 disk
└─ssdcache 254:2 0 465.7G 0 crypt
zram0 253:0 0 4G 0 disk [SWAP]
nvme0n1 259:0 0 931.5G 0 disk
├─nvme0n1p1 259:1 0 512M 0 part /boot
├─nvme0n1p2 259:2 0 464.7G 0 part
│ └─root 254:0 0 464.7G 0 crypt /var/lib/lxd/storage-pools/default
│ /mnt/cache
│ /
└─nvme0n1p3 259:3 0 466.3G 0 part
└─nvmecache 254:3 0 466.3G 0 crypt /mnt/cache1
Likely error comes from this place: https://github.com/lxc/lxd/blob/0e129cfcdf2b04c5f6143de9c79e82b8d65648a2/lxd/storage/backend_lxd.go#L3106
because the volume path contains "image" as a prefix in the path, so it has drivers.VolumeTypeImage
.
We can try to perform btrfs quota rescan /var/lib/lxd/storage-pools/default/images
or even more globally btrfs quota rescan /var/lib/lxd/storage-pools/default
.
@broizter have you tried to rescan quotas?
@broizter have you tried to rescan quotas?
Sorry for late reply! I ran the commands that you mentioned above and they both finished. Unfortunately it didn't change anything.
Then I think we should try downgrading btrfs-progs to 6.0 according to the user experience from https://discuss.linuxcontainers.org/t/lxd-btrfs-archlinux-failed-instance-creation/15633
Then if the issue disappears we should report btrfs guys about this degradation.
Suspicious commits in btrfs-progs: https://github.com/kdave/btrfs-progs/commit/dac73d6e2c68c7fb6955fb1e2121e35289e0ab61
... and this: https://github.com/kdave/btrfs-progs/commit/f486f0f01eb2afcca17e5acb1200e54347e948c8 https://github.com/kdave/btrfs-progs/commit/69b0d7756dd76c0a4a7304165a3d76de0e5170ad
This changes the output format of btrfs qgroup show
. I think it may break our code in func (d *btrfs) getQGroup(path string) (string, int64, error)
https://github.com/lxc/lxd/blob/master/lxd/storage/drivers/driver_btrfs_utils.go#L250
For instance, qgroupid
becomes Qgroupid
.
These commits are from v6.0.1
cc @tomponline @monstermunchkin
Downgrading to 6.0 made it possible to create containers again. /var/lib/lxd/storage-pools/default/images was still empty but after creating the new container there is now one directory there. It seems like the directories for containers created earlier are missing though.
@broizter thank you for your report and your help with the experiments. I think we have found a root cause for this problem.
Facing same issue on OpenSUSE Tumbleweed LXD version: 5.9 btrfs-progs: 6.0.2
Is there any update on this?
I'm a little bit scared of downgrading btrfs-progs as someone above mentioned data loss... (Also seems impossible to downgrade on OpenSUSE TW) In the same time I need to create new containers.
This is a serious bug.
Thanks @mihalicyn will take a look as looks like upstream changed something in their tooling .
i have same problem:
[werwolf@work] ~
❯ cat /etc/os-release
NAME="openSUSE Tumbleweed"
# VERSION="20221217"
ID="opensuse-tumbleweed"
ID_LIKE="opensuse suse"
VERSION_ID="20221217"
PRETTY_NAME="openSUSE Tumbleweed"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:tumbleweed:20221217"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"
DOCUMENTATION_URL="https://en.opensuse.org/Portal:Tumbleweed"
LOGO="distributor-logo-Tumbleweed"
[werwolf@work] ~
❯ rpm -qa | grep -i btrfspro
btrfsprogs-6.0.2-370.5.x86_64
btrfsprogs-udev-rules-6.0.2-370.5.noarch
[werwolf@work] ~
❯ rpm -qa | grep -i lxd
lxd-bash-completion-5.9-1.1.noarch
lxd-5.9-1.1.x86_64
FYI, @tomponline it was discussed to use a case insensitive comparison for Qgroupid
but https://github.com/kdave/btrfs-progs/commit/69b0d7756dd76c0a4a7304165a3d76de0e5170ad also changed the ---
string to a single -
.
Issue still exists on LXD 5.10 and btrfs-progs 6.1.2
lxc launch images:archlinux testcontainer
Creating testcontainer
Error: Failed instance creation: Failed creating instance from image: Failed to run: btrfs qgroup create 0/118103 /var/lib/lxd/storage-pools/default/images/b8a25949c295d0da6950277ebc240f867baa162e1c47ad007208a148e88e6489: exit status 1 (ERROR: unable to create quota group: File exists)
How strange, I tested 5.9 as broken on alpine and the fix worked on there. Perhaps its changed again!
Can you provide full reproducer steps from a fresh arch install (including installing and setting up lxd, as that didn't work for me, it was complaining during installing dependencies about missing btrfs sources, when I tried it before and followed the arch docs) which is why I switched to alpine to test it instead.
I just did a fresh arch install on a VM and was unable to reproduce the issue, although I did some more testing and noted something interesting.
On my machine with the old arch install, it's possible to "fix" the issue by using an .img file instead of existing path with the BTRFS backend, but on the fresh arch install both methods work.
Output from my machine with the old arch install:
root@tank ~
❯ lxc storage create newbtrfs btrfs
Storage pool newbtrfs created
root@tank ~
❯ lxc storage ls
+----------+--------+------------------------------------+-------------+---------+---------+
| NAME | DRIVER | SOURCE | DESCRIPTION | USED BY | STATE |
+----------+--------+------------------------------------+-------------+---------+---------+
| default | btrfs | /var/lib/lxd/storage-pools/default | | 128 | CREATED |
+----------+--------+------------------------------------+-------------+---------+---------+
| newbtrfs | btrfs | /var/lib/lxd/disks/newbtrfs.img | | 0 | CREATED |
+----------+--------+------------------------------------+-------------+---------+---------+
root@tank ~
❯ lxc launch images:archlinux testcontainer --storage newbtrfs
Creating testcontainer
Starting testcontainer
(container started without issue)
root@tank ~
❯ lxc stop testcontainer
root@tank ~
❯ lxc rm testcontainer
root@tank ~
❯ lxc storage rm newbtrfs
Storage pool newbtrfs deleted
root@tank ~
❯ lxc storage create newbtrfs btrfs source=/var/lib/lxd/storage-pools/newbtrfs
Storage pool newbtrfs created
root@tank ~
❯ lxc storage ls
+----------+--------+-------------------------------------+-------------+---------+---------+
| NAME | DRIVER | SOURCE | DESCRIPTION | USED BY | STATE |
+----------+--------+-------------------------------------+-------------+---------+---------+
| default | btrfs | /var/lib/lxd/storage-pools/default | | 128 | CREATED |
+----------+--------+-------------------------------------+-------------+---------+---------+
| newbtrfs | btrfs | /var/lib/lxd/storage-pools/newbtrfs | | 0 | CREATED |
+----------+--------+-------------------------------------+-------------+---------+---------+
root@tank ~
❯ lxc launch images:archlinux testcontainer --storage newbtrfs
Creating testcontainer
Error: Failed instance creation: Failed creating instance from image: Failed to run: btrfs qgroup create 0/118390 /var/lib/lxd/storage-pools/newbtrfs/images/e02e5aaa2325820ef2d88607de0af1d2aaf41d767fbc6b61ab1e52876e5b97b8: exit status 1 (ERROR: unable to create quota group: File exists)
Also, "/var/lib/lxd/storage-pools/default/images/" is empty even though I have at least one image. Not sure if that's of any interest.
root@tank ~
❯ lxc image ls
+-------+--------------+--------+------------------------------------------+--------------+-----------+----------+-------------------------------+
| ALIAS | FINGERPRINT | PUBLIC | DESCRIPTION | ARCHITECTURE | TYPE | SIZE | UPLOAD DATE |
+-------+--------------+--------+------------------------------------------+--------------+-----------+----------+-------------------------------+
| | 5f46674de4b6 | no | Archlinux current amd64 (20230115_04:19) | x86_64 | CONTAINER | 177.56MB | Jan 15, 2023 at 11:23pm (UTC) |
+-------+--------------+--------+------------------------------------------+--------------+-----------+----------+-------------------------------+
root@tank ~
❯ ls -la /var/lib/lxd/storage-pools/default/images
total 0
drwx--x--x 1 root root 0 16 jan 00.19 ./
drwxr-xr-x 1 root root 214 26 sep 11.25 ../
root@tank ~
❯ lxc storage ls
+---------+--------+------------------------------------+-------------+---------+---------+
| NAME | DRIVER | SOURCE | DESCRIPTION | USED BY | STATE |
+---------+--------+------------------------------------+-------------+---------+---------+
| default | btrfs | /var/lib/lxd/storage-pools/default | | 128 | CREATED |
+---------+--------+------------------------------------+-------------+---------+---------+
OK good, so it works on new storage pools then, at least I'm not going mad :)
On the affected system can you show output of:
sudo btrfs qgroup show -e -f --raw /var/lib/lxd/storage-pools/newbtrfs
Please can you show lxc storage show newbtrfs
?
Also, are you saying that /var/lib/lxd/storage-pools
is a single BTRFS device shared with the host OS?
/var/lib/lxd/storage-pools is a BTRFS device shared with the host OS yes. If you use BTRFS on your root partition you will get this option during lxd init Would you like to create a new btrfs subvolume under /var/lib/lxd? (yes/no) [default=yes]:
so that's what I'm using.
root@tank ~
❯ btrfs qgroup show -e -f --raw /var/lib/lxd/storage-pools/newbtrfs
Qgroupid Referenced Exclusive Max exclusive Path
-------- ---------- --------- ------------- ----
0/119063 16384 16384 none ROOT/var/lib/lxd/storage-pools/newbtrfs
root@tank ~
❯ lxc storage show newbtrfs
config:
source: /var/lib/lxd/storage-pools/newbtrfs
volatile.initial_source: /var/lib/lxd/storage-pools/newbtrfs
description: ""
name: newbtrfs
driver: btrfs
used_by: []
status: Created
locations:
- none
root@tank ~
❯ btrfs subvolume list /
ID 256 gen 1432480 top level 5 path ROOT
ID 292 gen 18 top level 256 path var/lib/portables
ID 293 gen 19 top level 256 path var/lib/machines
ID 295 gen 1431489 top level 256 path var/lib/lxd/storage-pools/default
ID 119063 gen 1432480 top level 256 path var/lib/lxd/storage-pools/newbtrfs
(etc etc)
Can you include in btrfs subvolume list /
the offending image volume /var/lib/lxd/storage-pools/newbtrfs/images/e02e5aaa2325820ef2d88607de0af1d2aaf41d767fbc6b61ab1e52876e5b97b81
if it still exists?
It does not exist.
root@tank ~
❯ btrfs subvolume list / | grep newbtrfs
ID 119063 gen 1432496 top level 256 path var/lib/lxd/storage-pools/newbtrfs
root@tank ~
❯ ls -la /var/lib/lxd/storage-pools/newbtrfs/images
total 0
drwx--x--x 1 root root 0 16 jan 11.19 ./
drwxr-xr-x 1 root root 214 16 jan 11.10 ../
Here's how it looks on the fresh arch install where everything is functional.
root@archlinux ~# btrfs qgroup show -e -f --raw /var/lib/lxd/storage-pools/default
ERROR: can't list qgroups: quotas not enabled
root@archlinux ~# btrfs subvolume list / | grep default
ID 267 gen 110 top level 256 path var/lib/lxd/storage-pools/default
ID 269 gen 115 top level 267 path var/lib/lxd/storage-pools/default/containers/testcontainer
ID 271 gen 110 top level 267 path var/lib/lxd/storage-pools/default/images/a160d9f01130bfbd2e29eae8596c52fc8dc75b3219178884eee41fb86c301804
root@archlinux ~# ls -la /var/lib/lxd/storage-pools/default/images/
total 0
drwx--x--x 1 root root 128 Jan 16 11:14 ./
drwxr-xr-x 1 root root 214 Jan 14 12:27 ../
drwx--x--x 1 root root 56 Jan 16 11:13 a160d9f01130bfbd2e29eae8596c52fc8dc75b3219178884eee41fb86c301804/
Interesting that it gives me "ERROR: can't list qgroups: quotas not enabled" on the machine where everything works correctly.
any news about this issue? I don't feel like pushing, but this is a pretty serious problem that has reduced the ability to use lxd without rebuilding the entire cluster to zero.
Can you clarify what is working and what isn't, with reproducer steps for a fresh system for the not working scenario?
Oh has the cause not been found yet? I guess I should rebuild my LXD storage then since that seems to be a workaround. Not being able to create new containers for many months now has been a bit annoying.
The cause was (considered) fixed in https://github.com/lxc/lxd/pull/11252
But its not clear whether its fully resolved or in what situations it is still broken.
It has been caused by an upstream change to BTRFS tooling, and LXD has had to update its parsing of the command output to accommodate both old and new versions of the BTRFS tooling.
I understand! It's a very strange issue. I was unable to reproduce it with a fresh OS install. For now I will migrate over to using an ".img" based BTRFS backend instead of "existing path" so that I can create containers again.
Can you clarify what is working and what isn't, with reproducer steps for a fresh system for the not working scenario?
I can't reproduce the problem on new installations, but it affected all existing ones and it's completely unclear to me how it can be fixed. below I will give everything that in my opinion can be useful, if you need something else, ask me, I will add.
my home lab & test server
[werwolf@power] ~
❯ cat /etc/os-release
NAME="openSUSE Tumbleweed"
# VERSION="20230128"
ID="opensuse-tumbleweed"
ID_LIKE="opensuse suse"
VERSION_ID="20230128"
PRETTY_NAME="openSUSE Tumbleweed"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:tumbleweed:20230128"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"
DOCUMENTATION_URL="https://en.opensuse.org/Portal:Tumbleweed"
LOGO="distributor-logo-Tumbleweed"
[werwolf@power] ~
❯ inxi -Fxxx
System:
Host: power Kernel: 6.1.8-1-default arch: x86_64 bits: 64 compiler: gcc
v: 12.2.1 Desktop: N/A wm: KWin dm: SDDM Distro: openSUSE Tumbleweed
20230128
Machine:
Type: Server System: FUJITSU product: PRIMERGY TX150 S7 v: GS01
serial: <superuser required> Chassis: type: 17 v: TX150S7FS
serial: <superuser required>
Mobo: FUJITSU model: D2759 v: S26361-D2759-A13 WGS04 GS02
serial: <superuser required> BIOS: FUJITSU // Phoenix
v: 6.00 Rev. 1.21.2759.A1 date: 07/11/2018
CPU:
Info: quad core model: Intel Xeon X3430 bits: 64 type: MCP
smt: <unsupported> arch: Nehalem rev: 5 cache: L1: 256 KiB L2: 1024 KiB
L3: 8 MiB
Speed (MHz): avg: 2527 min/max: 1197/2395 boost: enabled cores: 1: 2527
2: 2527 3: 2527 4: 2527 bogomips: 19150
Flags: ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
Device-1: Matrox Systems MGA G200e [Pilot] ServerEngines
vendor: Fujitsu Solutions driver: mgag200 v: kernel pcie: speed: 2.5 GT/s
lanes: 1 ports: active: VGA-1 empty: none bus-ID: 14:00.0
chip-ID: 102b:0522 class-ID: 0300
Display: unspecified server: X.Org v: 21.1.6 with: Xwayland v: 22.1.7
driver: X: loaded: N/A unloaded: nvidia gpu: mgag200 note: X driver n/a
display-ID: localhost:10.0 screens: 1
Screen-1: 0 s-res: 3840x1080 s-dpi: 96 s-size: 1016x285mm (40.00x11.22")
s-diag: 1055mm (41.54")
Monitor-1: DisplayPort-0 pos: right res: 1920x1080 hz: 60 dpi: 93
size: 527x296mm (20.75x11.65") diag: 604mm (23.8") modes: N/A
Monitor-2: HDMI-A-0 pos: primary,left res: 1920x1080 hz: 60 dpi: 93
size: 527x296mm (20.75x11.65") diag: 604mm (23.8") modes: N/A
API: OpenGL v: 4.5 Mesa 22.3.4 renderer: llvmpipe (LLVM 15.0.7 128 bits)
direct render: Yes
Audio:
Message: No device data found.
Sound Server-1: PulseAudio v: 16.1 running: no
Sound Server-2: PipeWire v: 0.3.65 running: no
Network:
Device-1: Intel 82571EB/82571GB Gigabit Ethernet driver: e1000e v: kernel
pcie: speed: 2.5 GT/s lanes: 4 port: 3000 bus-ID: 11:00.0 chip-ID: 8086:10bc
class-ID: 0200
IF: eth0 state: up speed: 100 Mbps duplex: full mac: 00:15:17:e8:e2:39
Device-2: Intel 82571EB/82571GB Gigabit Ethernet driver: e1000e v: kernel
pcie: speed: 2.5 GT/s lanes: 4 port: 3020 bus-ID: 11:00.1 chip-ID: 8086:10bc
class-ID: 0200
IF: eth1 state: down mac: 00:15:17:e8:e2:38
Device-3: Intel 82571EB/82571GB Gigabit Ethernet driver: e1000e v: kernel
pcie: speed: 2.5 GT/s lanes: 4 port: 4000 bus-ID: 12:00.0 chip-ID: 8086:10bc
class-ID: 0200
IF: eth2 state: down mac: 00:15:17:e8:e2:3b
Device-4: Intel 82571EB/82571GB Gigabit Ethernet driver: e1000e v: kernel
pcie: speed: 2.5 GT/s lanes: 4 port: 4020 bus-ID: 12:00.1 chip-ID: 8086:10bc
class-ID: 0200
IF: eth3 state: up speed: 1000 Mbps duplex: full mac: 00:15:17:e8:e2:3a
Device-5: Intel 82574L Gigabit Network vendor: Fujitsu Solutions
driver: e1000e v: kernel pcie: speed: 2.5 GT/s lanes: 1 port: 5000
bus-ID: 13:00.0 chip-ID: 8086:10d3 class-ID: 0200
IF: ens0 state: up speed: 1000 Mbps duplex: full mac: 00:19:99:b8:5b:f4
IF-ID-1: br0 state: up speed: 10000 Mbps duplex: unknown
mac: 82:38:4c:7f:80:db
IF-ID-2: veth120540aa state: up speed: 10000 Mbps duplex: full
mac: b6:49:74:6c:24:e6
IF-ID-3: wg0 state: unknown speed: N/A duplex: N/A mac: N/A
IF-ID-4: ygg0 state: unknown speed: 10 Mbps duplex: full mac: N/A
IF-ID-5: zt0 state: unknown speed: 10 Mbps duplex: full
mac: ce:37:b4:f9:bb:93
Drives:
Local Storage: total: 13.2 TiB used: 12.54 TiB (95.0%)
ID-1: /dev/sda vendor: Western Digital model: WUH721414ALE604
size: 12.73 TiB speed: 3.0 Gb/s type: HDD rpm: 7200 serial: QGKDSLUT
rev: W110 scheme: GPT
ID-2: /dev/sdb vendor: Micron model: 1100 MTFDDAK512TBN size: 476.94 GiB
speed: 3.0 Gb/s type: SSD serial: 163814327E3A rev: U001 scheme: GPT
Partition:
ID-1: / size: 440 GiB used: 280.28 GiB (63.7%) fs: btrfs dev: /dev/sdb2
ID-2: /home size: 12.73 TiB used: 12.27 TiB (96.3%) fs: btrfs
dev: /dev/sda1
Swap:
ID-1: swap-1 type: partition size: 35.9 GiB used: 900.5 MiB (2.4%)
priority: -2 dev: /dev/sdb3
Sensors:
System Temperatures: cpu: 60.0 C mobo: N/A
Fan Speeds (RPM): N/A
Info:
Processes: 275 Uptime: 2h 1m wakeups: 3 Memory: 31.34 GiB
used: 9.46 GiB (30.2%) Init: systemd v: 252 target: multi-user (3)
default: multi-user Compilers: gcc: 12.2.1 alt: 11/12/13 Packages: pm: rpm
pkgs: N/A note: see --rpm Shell: Zsh v: 5.9 running-in: sshd (SSH)
inxi: 3.3.23
[werwolf@power] ~
❯ rpm -qa | grep -E 'btrfs|lxd' ⏎
lxd-5.9-2.1.x86_64
btrfsprogs-udev-rules-6.1.3-375.1.noarch
libbtrfs0-6.1.3-375.1.x86_64
lxd-bash-completion-5.9-2.1.noarch
libbd_btrfs2-2.28-1.1.x86_64
libudisks2-0_btrfs-2.9.4-6.1.x86_64
btrfsprogs-6.1.3-375.1.x86_64
btrfsmaintenance-0.5-67.64.noarch
[werwolf@power] ~
❯ lxc launch images:almalinux/8 oo
Creating oo
Error: Failed instance creation: Failed creating instance from image: Failed to run: btrfs qgroup create 0/3619 /var/lib/lxd/storage-pools/local/images/bb49e749b7a31656a0c1e4c6ac4e407a2d600bbb0f0beb79ad02fe4eb5fd0253: exit status 1 (ERROR: unable to create quota group: File exists)
[werwolf@power] ~
❯ sudo ls -lah /var/lib/lxd/storage-pools/local/images/ ⏎
[sudo] пароль для root:
итого 0
drwx--x--x 1 root root 0 фев 3 16:20 .
drwxr-xr-x 1 root root 214 янв 19 04:02 ..
[werwolf@power] ~
❯ sudo btrfs qgroup show -e -f --raw /var/lib/lxd/storage-pools/
Qgroupid Referenced Exclusive Max exclusive Path
-------- ---------- --------- ------------- ----
0/258 260327317504 3302572032 none @/.snapshots/1/snapshot
Same experience here. Affects existing installations but can't reproduce on fresh installs. The "fix" is to create a new BTRFS storage backend and migrate to that one instead. You have to ommit the "source=" argument when creating the new backend otherwise you will face the same issue. It needs to create an ".img" file.
You have to ommit the "source=" argument when creating the new backend
Do you still see the same issue without using source
inside a fresh machine (a VM for instance)?
No, I'm unable to reproduce the issue on a fresh machine. Using "source" works without issue on fresh machines, but on existing installs it causes errors when launching containers for example.
Right, makes sense, thanks. So it looks like the BTRFS quotas have gotten into a mess on the existing systems. I'll see if we can fix that somehow.
I mentioned it in an earlier post, but the difference between my existing install and a fresh one is that btrfs qgroup show -e -f --raw /var/lib/lxd/storage-pools/default
lists the quota groups on the broken existing install, but if you run the same command on the fresh install where everything works it will instead give this error ERROR: can't list qgroups: quotas not enabled
.
So basically existing broken install = quotas are enabled. Fresh working install = quotas are not enabled.
Yeah I saw that but it doesn't really make any sense to me. Are quotas ever working on the "fixed" systems?
Not sure, how do I test that?
Set a low quota and try to fill it up.
I will test this and come back when I have some more time.
An additional note and I'm sorry if this is confusing. On my broken existing install I created a new storage backend without using "source" so it instead creates an .img loop device, and when I run btrfs qgroup show -e -f --raw /var/lib/lxd/storage-pools/newbtrfs/
it also gives me ERROR: can't list qgroups: quotas not enabled
even though everything now works perfectly. At least as far as I can tell.
So the common thing between the fresh working install where "source" and ".img" works and the broken install where only "img" works is that the output shows that quotas are not enabled on the working storage backends.
TLDR; If BTRFS says quotas are enabled then you get errors, if BTRFS says qutoas are not enabled everything works fine. On fresh installs quotas are not enabled but on existing installs they are.
Is BTRFS quotas supposed to be enabled by LXD? It almost seems like there is a bug that prevents them from being enabled on newly created storage pools.
Same problem, openSUSE Tumbleweed. I've disabled quota on a subvolume with existing storage pool using btrfs quota disable /var/lib/lxd/storage-pools/default
and I am able to create containers.
Thanks for the tip, that does indeed "fix" the issue. I will use that as a workaround for now instead of migrating all my containers to a new storage pool.
Required information
Issue description
Trying to create new containers fails with this message
Error: Failed instance creation: Failed creating instance from image: Failed to run: btrfs qgroup create 0/104960 /var/lib/lxd/storage-pools/default/images/f161c60ebcfe5806986bcfef748df7cf23bf7eb39eb5b7c130d0e5aa5371522f: exit status 1 (ERROR: unable to create quota group: File exists)
. My guess is that it started after upgrading btrfs-progs to 6.0.1. Upgrading to the latest version (6.0.2) did not fix the issue.Steps to reproduce
Information to attach
lxc monitor output https://pastebin.com/g6PxBYcb
Let me know if there's any other information you want me to attach. Same issue as this post I guess although I don't think downgrading btrfs-progs is a good fix. https://discuss.linuxcontainers.org/t/lxd-btrfs-archlinux-failed-instance-creation/15633