canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

Dir quota issue, non-snap build #6333

Closed s3rj1k closed 5 years ago

s3rj1k commented 5 years ago

Project quota on Dir storage backend backed by ext4 filesystem on soft-raid (mdadm) device works incorectly when this configuration is used in non-snap build of LXD.

To reproduce:

Same issue can be reproduced in Ubuntu 18.04.3 with kernel 5.0.x in similar configuration

Any ideas how to fix this?

stgraber commented 5 years ago

I'll need to play with it a bit, but this isn't going to be a LXD bug, repquota shows we set the quota properly and the kernel shows it applied through df, so any issue with accounting/enforcement after that would be a kernel bug.

s3rj1k commented 5 years ago

@stgraber LXD snap version works with this disk layout and configs.

stgraber commented 5 years ago

And a container without a quota doesn't hit that error?

stgraber commented 5 years ago

File too large sounds like you may be hitting a prlimit/ulimit rather than it being a quota issue, which would explain why you would see something different between snap and manual build as one is started through systemd and our wrapper while the other isn't, possibly leading to different limits being applied.

s3rj1k commented 5 years ago

Inside CT:

root@test:~# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 7730
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I am actually using systemd to unit to start LXD

[Unit]
Description=LXD - main daemon
After=network-online.target openvswitch-switch.service lxcfs.service lxd.socket
Requires=network-online.target lxcfs.service lxd.socket

[Service]
EnvironmentFile=-/etc/environment
# ExecStartPre=/usr/lib/x86_64-linux-gnu/lxc/lxc-apparmor-load
ExecStart=/usr/bin/lxd --group lxd --logfile=/var/log/lxd/lxd.log
ExecStartPost=/usr/bin/lxd waitready --timeout=600
KillMode=process
TimeoutStartSec=600s
TimeoutStopSec=30s
Restart=on-failure
LimitNOFILE=infinity
LimitNPROC=infinity
LimitFSIZE=infinity
TasksMax=infinity

[Install]
Also=lxd-containers.service lxd.socket

added LimitFSIZE=infinity, still seeing error

s3rj1k commented 5 years ago

Actually yes, without quota still have this error, WTF?

setting

LimitAS=infinity
LimitCORE=infinity
LimitCPU=infinity
LimitDATA=infinity
LimitFSIZE=infinity
LimitLOCKS=infinity
LimitMEMLOCK=infinity
LimitMSGQUEUE=infinity
LimitNICE=infinity
LimitNOFILE=infinity
LimitNPROC=infinity
LimitRSS=infinity
LimitRTPRIO=infinity
LimitRTTIME=infinity
LimitSIGPENDING=infinity
LimitSTACK=infinity
TasksMax=infinity

does not help

disabled apparmor, still seeing error

s3rj1k commented 5 years ago

@stgraber this is shiftfs related. removing /lib/modules/5.3.0-19-generic/kernel/fs/shiftfs.ko fixed above error with large file.

Now the questing is how do I disable shiftfs property, blacklisting module did not help last time I tried.

Also there are no shiftfs settings in LXD itself.

s3rj1k commented 5 years ago

@stgraber I can confirm that quota works as expected without shiftfs.

Why on earth this is enabled by default on all kernels and LXD and in snap is disabled?

stgraber commented 5 years ago

LXD uses shiftfs whenever available, though due to kernel issues that are still being worked on, we have a patch in the snap that adds a knob to control it, disabling it by default and eventually enabling it if we're running on a known good kernel.

So sounds like this issue has nothing to do with project quotas but instead with handling of a particular filesystem operation. We'll need @brauner to look into that one and see if it's something we've fixed already in shiftfs or that needs extra fixing.

stgraber commented 5 years ago

Closing as there isn't any action for us to pursue in LXD, but will put it on @brauner's todo to sort this out at the shiftfs level.

brauner commented 5 years ago
diff --git a/fs/shiftfs.c b/fs/shiftfs.c
index 55bb32b611f2..49b7777dde22 100644
--- a/fs/shiftfs.c
+++ b/fs/shiftfs.c
@@ -2045,6 +2045,7 @@ static int shiftfs_fill_super(struct super_block *sb, void *raw_data,
                err = -EINVAL;
                goto out_put_path;
        }
+       sb->s_maxbytes = MAX_LFS_FILESIZE;

        inode = new_inode(sb);
        if (!inode) {

This should fix it.

stgraber commented 5 years ago

@brauner thanks, that was quick :) Can you get a test kernel for @s3rj1k to validate that his stuff works fine? And do the usual sending of the fix through the usual Ubuntu channels :)

One of these days we'll actually have shiftfs behave the way we want it!

s3rj1k commented 5 years ago

@stgraber @brauner Thanks )

s3rj1k commented 5 years ago

I am willing to test this out as soon as I get test kernel, also hoping that the fix will be on both LTS and non LTS Ubuntu distributions.

brauner commented 5 years ago

I think having Kinderkrankheiten like this is pretty normal for something like shiftfs. I'm actually more and more happy since it also allowed us to find bugs in other filesystems. I'm pretty sure that our overlayfs changes should be upstreamed but there's only so hours in one day. :)

brauner commented 5 years ago

@s3rj1k building a kernel atm.

brauner commented 5 years ago

With my fix:

Script started on 2019-10-22 20:37:32+0000
root@b3:~# fallocate --length 5g aaaa
root@b3:~# du -sh ./aaaa 
5.0G    ./aaaa
root@b3:~# rm aaaa 
root@b3:~# exit
Script done on 2019-10-22 20:37:50+0000

Here is a kernel for you to test: https://drive.google.com/open?id=15pLG3FAE52h6njRCfzgkZnG3tLen4ct0

brauner commented 5 years ago

@s3rj1k ^^

s3rj1k commented 5 years ago

@brauner kernel fixes File too large error but breaks quota enforcement 2019-10-23_00-02-55

as seen in screen I can create file larger then available quota.

Same kernel but without shiftfs quota works fine.

brauner commented 5 years ago

@stgraber, any idea about the quota stuff for the underlay?

brauner commented 5 years ago

I can reproduce this on non-shiftfs as well though.

s3rj1k commented 5 years ago

@brauner did you do tune2fs -O project,quota -Q prjquota /dev/sda1 on fs before mounting it with prjquota ?

quota for ext4 works only when enabled by tune2fs and mounted with proper mount option

brauner commented 5 years ago

No, I don't think so. @stgraber knows the quota stuff better so I'll let him comment first.

brauner commented 5 years ago

Might be as simple as:

commit 69a28dcd8c22b3afb71ba867837f508e14424910 (HEAD -> shiftfs_fallocate, origin/shiftfs_fallocate)
Author: Christian Brauner <christian.brauner@ubuntu.com>
Date:   Wed Oct 23 00:16:17 2019 +0200

    shiftfs: drop CAP_SYS_RESOURCE to avoid overriding disk quotas

    Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

diff --git a/fs/shiftfs.c b/fs/shiftfs.c
index 49b7777dde22..81a73b87c395 100644
--- a/fs/shiftfs.c
+++ b/fs/shiftfs.c
@@ -2046,6 +2046,8 @@ static int shiftfs_fill_super(struct super_block *sb, void *raw_data,
                goto out_put_path;
        }
        sb->s_maxbytes = MAX_LFS_FILESIZE;
+       /* Don't override disk quota limits or use reserved space. */
+       cap_lower(sbinfo->creator_cred->cap_effective, CAP_SYS_RESOURCE);

        inode = new_inode(sb);
        if (!inode) {
s3rj1k commented 5 years ago

@brauner I can check later today (in morning by CEST) in my setup if you prepare updated kernel.

brauner commented 5 years ago

@s3rj1k sure, will link to a new kernel here.

stgraber commented 5 years ago

@brauner yeah, enabling project quotas needs some effort, the tooling for it is rather awful :) LXD dir containers on a filesystem that's had the tune2fs property set and is mounted with the prjquota option will normally do the right thing then.

brauner commented 5 years ago

@brauner yeah, enabling project quotas needs some effort, the tooling for it is rather awful :) LXD dir containers on a filesystem that's had the tune2fs property set and is mounted with the prjquota option will normally do the right thing then.

My Ubuntu vm doesn't support it and I got pretty mad when I realized that adding prjquota as a mount option in /etc/fstab forced me to reboot into rescue mode and remount my rootfs rw, so I could edit my fstab...

brauner commented 5 years ago

@s3rj1k https://drive.google.com/open?id=1hZFPw7cqCQL1ptCHQINgb2NBi4A5rTp2

s3rj1k commented 5 years ago

@brauner Confirming that latest kernel works correctly with quotas and shiftfs. Yay))

When this fix is expected to be publicly available? (To double test this on public kernel)

brauner commented 5 years ago

@brauner Confirming that latest kernel works correctly with quotas and shiftfs. Yay))

Excellent.

When this fix is expected to be publicly available? (To double test this on public kernel)

In a couple of weeks. I'll give you a link to the launchpad bug here.

brauner commented 5 years ago

@s3rj1k here are the launchpad bugs to track:

Once a kernel will be proposed it'll be mentioned in these bugs. You can subscribe to them to get notified when that happens.

s3rj1k commented 5 years ago

@brauner Thanks, hoping to see this fix soon )

s3rj1k commented 5 years ago

@stgraber Small not related question. How can I programmatically match container with project quota ID, to get usage statistics?

stgraber commented 5 years ago

The id is 10000 + id of container as you'd find them in lxd sql global "SELECT id, name FROM instances;"

Note that the used space is also reported through lxc info NAME.

s3rj1k commented 5 years ago

@stgraber Thanks, lxc info NAME sadly does not report correct usage. Sometimes Disk usage: paragraph disappears from lxc info NAME output. Other times it takes a lot of time to sync actual disk usage with what reports lxc info NAME.

изображение

изображение

s3rj1k commented 5 years ago

@stgraber Should I do separate Issue for lxc info NAME bug?

stgraber commented 5 years ago

Yeah, that'd be good to track down, I would have expected data to be returned just fine for dir backend.