canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.34k stars 930 forks source link

I/O disk limits on mdadm RAID virtual devices #3515

Closed ogai closed 7 years ago

ogai commented 7 years ago

When having 2 disks on a mdadm RAID 1 array, a configuration very common on server hosters like hetzner.de, I/O limiting the disk of a container has no effect.

Add extra code to track down backing devices for mdadm RAID.

This issue has been discussed on https://discuss.linuxcontainers.org/t/limiting-disk-io-on-lxd-containers/261/7

stgraber commented 7 years ago

So actually, I just tried this here and things appear to work with md0 as the backing device. My setup:

~ # time dd if=/dev/zero of=test.img bs=4M count=10 conv=fsync
10+0 records in
10+0 records out
real    0m 40.36s
user    0m 0.00s
sys 0m 0.04s

So that's giving me exactly my 1MB/s of write speed using current LXD 2.15.

root@vm02:~# lxc config show c1
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Alpine edge amd64 (20170710_01:44)
  image.os: Alpine
  image.release: edge
  image.serial: "20170710_01:44"
  volatile.base_image: 5834296d2606bd4f6f5a58ab3bc40928aa4e853b800ab5f7fd143bd8fc162380
  volatile.eth0.hwaddr: 00:16:3e:e0:99:87
  volatile.eth0.name: eth0
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.power: RUNNING
devices:
  root:
    limits.read: 1MB
    limits.write: 1MB
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
root@vm02:~# cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid1 vda3[1] vda2[0]
      14639104 blocks super 1.2 [2/2] [UU]
root@vm02:~# cat /var/log/lxd/c1/lxc.conf 
lxc.cap.drop = sys_time sys_module sys_rawio
lxc.mount.auto = proc:rw sys:rw
lxc.autodev = 1
lxc.pts = 1024
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file,optional
lxc.mount.entry = /dev/net/tun dev/net/tun none bind,create=file,optional
lxc.mount.entry = /proc/sys/fs/binfmt_misc proc/sys/fs/binfmt_misc none rbind,create=dir,optional
lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none rbind,create=dir,optional
lxc.mount.entry = /sys/fs/pstore sys/fs/pstore none rbind,create=dir,optional
lxc.mount.entry = /sys/kernel/debug sys/kernel/debug none rbind,create=dir,optional
lxc.mount.entry = /sys/kernel/security sys/kernel/security none rbind,create=dir,optional
lxc.mount.entry = /dev/mqueue dev/mqueue none rbind,create=dir,optional
lxc.include = /usr/share/lxc/config/common.conf.d/
lxc.logfile = /var/log/lxd/c1/lxc.log
lxc.loglevel = trace
lxc.arch = linux64
lxc.hook.pre-start = /usr/bin/lxd callhook /var/lib/lxd 1 start
lxc.hook.post-stop = /usr/bin/lxd callhook /var/lib/lxd 1 stop
lxc.tty = 0
lxc.utsname = c1
lxc.mount.entry = /var/lib/lxd/devlxd dev/lxd none bind,create=dir 0 0
lxc.aa_profile = lxd-c1_</var/lib/lxd>//&:lxd-c1_<var-lib-lxd>:
lxc.seccomp = /var/lib/lxd/security/seccomp/c1
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536
lxc.cgroup.blkio.throttle.read_bps_device = 9:0 1048576
lxc.cgroup.blkio.throttle.write_bps_device = 9:0 1048576
lxc.network.0.type = veth
lxc.network.0.flags = up
lxc.network.0.link = lxdbr0
lxc.network.0.hwaddr = 00:16:3e:e0:99:87
lxc.network.0.name = eth0
lxc.rootfs.backend = dir
lxc.rootfs = /var/lib/lxd/containers/c1/rootfs
lxc.mount.entry = /var/lib/lxd/shmounts/c1 dev/.lxd-mounts none bind,create=dir 0 0
root@vm02:~# cat /sys/fs/cgroup/blkio/lxc/c1/blkio.throttle.read_bps_device
9:0 1048576
root@vm02:~# cat /sys/fs/cgroup/blkio/lxc/c1/blkio.throttle.write_bps_device
9:0 1048576
stgraber commented 7 years ago

I suspect you're not using the directory storage backend, but instead something more advanced which adds another layer of indirection between the container storage and the underlying block storage.

root@vm02:~# lxc storage list
+---------+-------------+--------+------------------------------------+---------+
|  NAME   | DESCRIPTION | DRIVER |               SOURCE               | USED BY |
+---------+-------------+--------+------------------------------------+---------+
| default |             | dir    | /var/lib/lxd/storage-pools/default | 2       |
+---------+-------------+--------+------------------------------------+---------+
stgraber commented 7 years ago

Oh right, you have LVM on top of your RAID1 and then a LV used as dir storage for LXD.

stgraber commented 7 years ago

So that's indeed an extra layer of indirection which prevents the current raid detection code to trigger, let me reproduce with that.

stgraber commented 7 years ago

Updated my setup to match yours. /var/lib/lxd is now on ext4 on a LV (/dev/mapper/test-lxd). I've setup my container with the same speed limit of 1MB/s read and write.

This resulted in a blkio limit for the LV being setup and throttling appears to work just fine:

~ # time dd if=/dev/zero of=test.img bs=4M count=10 conv=fsync
10+0 records in
10+0 records out
real    0m 40.07s
user    0m 0.00s
sys 0m 0.04s
root@vm02:~# cat /var/log/lxd/c1/lxc.conf 
lxc.cap.drop = sys_time sys_module sys_rawio
lxc.mount.auto = proc:rw sys:rw
lxc.autodev = 1
lxc.pts = 1024
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file,optional
lxc.mount.entry = /dev/net/tun dev/net/tun none bind,create=file,optional
lxc.mount.entry = /proc/sys/fs/binfmt_misc proc/sys/fs/binfmt_misc none rbind,create=dir,optional
lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none rbind,create=dir,optional
lxc.mount.entry = /sys/fs/pstore sys/fs/pstore none rbind,create=dir,optional
lxc.mount.entry = /sys/kernel/debug sys/kernel/debug none rbind,create=dir,optional
lxc.mount.entry = /sys/kernel/security sys/kernel/security none rbind,create=dir,optional
lxc.mount.entry = /dev/mqueue dev/mqueue none rbind,create=dir,optional
lxc.include = /usr/share/lxc/config/common.conf.d/
lxc.logfile = /var/log/lxd/c1/lxc.log
lxc.loglevel = trace
lxc.arch = linux64
lxc.hook.pre-start = /usr/bin/lxd callhook /var/lib/lxd 1 start
lxc.hook.post-stop = /usr/bin/lxd callhook /var/lib/lxd 1 stop
lxc.tty = 0
lxc.utsname = c1
lxc.mount.entry = /var/lib/lxd/devlxd dev/lxd none bind,create=dir 0 0
lxc.aa_profile = lxd-c1_</var/lib/lxd>//&:lxd-c1_<var-lib-lxd>:
lxc.seccomp = /var/lib/lxd/security/seccomp/c1
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536
lxc.cgroup.blkio.throttle.read_bps_device = 253:0 1048576
lxc.cgroup.blkio.throttle.write_bps_device = 253:0 1048576
lxc.network.0.type = veth
lxc.network.0.flags = up
lxc.network.0.link = lxdbr0
lxc.network.0.hwaddr = 00:16:3e:e5:99:d1
lxc.network.0.name = eth0
lxc.rootfs.backend = dir
lxc.rootfs = /var/lib/lxd/containers/c1/rootfs
lxc.mount.entry = /var/lib/lxd/shmounts/c1 dev/.lxd-mounts none bind,create=dir 0 0
root@vm02:~# cat /sys/fs/cgroup/blkio/lxc/c1/blkio.throttle.write_bps_device
253:0 1048576
root@vm02:~# cat /sys/fs/cgroup/blkio/lxc/c1/blkio.throttle.read_bps_device
253:0 1048576
root@vm02:~# ls -lh /dev/mapper/
total 0
crw------- 1 root root 10, 236 Jul 10 19:11 control
lrwxrwxrwx 1 root root       7 Jul 10 19:55 test-lxd -> ../dm-0
root@vm02:~# ls -lh /dev/dm-0
brw-rw---- 1 root disk 253, 0 Jul 10 19:55 /dev/dm-0
stgraber commented 7 years ago

Tried with the same flags you list on the discuss post:

root@c2:~# dd if=/dev/zero of=/root/testfile bs=1M count=10 oflag=direct
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 10.0082 s, 1.0 MB/s
stgraber commented 7 years ago

What version of LXD are you using?

ogai commented 7 years ago

@stgraber I was using the one available on the default repositories of Ubuntu 16.04, which was 10.0.9. After upgrading to 10.15 with

sudo add-apt-repository ppa:ubuntu-lxc/lxd-stable 
sudo apt-get update
sudo apt-get install lxd

the limit works on my setup, even without the need to restart either the host nor the guest.

stgraber commented 7 years ago

Ah, interesting. I'll do some tests with LXD 2.0.10 then, maybe we forgot to backport a commit related to this.

stgraber commented 7 years ago

finally getting back to this, setting up a test environment on LXD 2.0.10

stgraber commented 7 years ago

So looks like whatever the issue was it got fixed with 2.0.10 because I'm unable to reproduce it here.

stgraber@castiana:~$ rssh ubuntu@lantea.maas.mtl
Warning: Permanently added 'lantea.maas.mtl' (ECDSA) to the list of known hosts.
root@lantea:~# systemctl stop lxd lxd.socket
root@lantea:~# apt update
Hit:1 http://us.archive.ubuntu.com/ubuntu xenial InRelease
Get:2 http://us.archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
Get:3 http://us.archive.ubuntu.com/ubuntu xenial-backports InRelease [102 kB]
Get:4 http://us.archive.ubuntu.com/ubuntu xenial-security InRelease [102 kB]
Fetched 306 kB in 0s (546 kB/s)                              
Reading package lists... Done
Building dependency tree       
Reading state information... Done
All packages are up to date.

root@lantea:~# cat /proc/partitions 
major minor  #blocks  name

   8        0  976762584 sda
   8       32 2930266584 sdc
   8       48 2930266584 sdd
   8       16  976762584 sdb
  11        0    1048575 sr0
   8       80  117220824 sdf
   8       81  117219783 sdf1
   8       64  117220824 sde

root@lantea:~# mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sdc /dev/sdd
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
mdadm: size set to 2930135488K
mdadm: automatically enabling write-intent bitmap on large array
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

root@lantea:~# pvcreate /dev/md0
  Physical volume "/dev/md0" successfully created
root@lantea:~# vgcreate test /dev/md0
  Volume group "test" successfully created
root@lantea:~# lvcreate -n lxd -l 100%FREE test
  Logical volume "lxd" created.

root@lantea:~# mkfs.ext4 /dev/mapper/test-lxd 
mke2fs 1.42.13 (17-May-2015)
Creating filesystem with 732532736 4k blocks and 183140352 inodes
Filesystem UUID: 2b6aad10-7dfc-4075-9763-46e43992b70b
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
    102400000, 214990848, 512000000, 550731776, 644972544

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done       

root@lantea:~# mount /dev/mapper/test-lxd /var/lib/lxd
root@lantea:~# systemctl start lxd.socket

root@lantea:~# lxc info
Generating a client certificate. This may take a minute...
If this is your first time using LXD, you should also run: sudo lxd init
To start your first container, try: lxc launch ubuntu:16.04

config: {}
api_extensions:
- id_map
api_status: stable
api_version: "1.0"
auth: trusted
public: false
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIFZDCCA0ygAwIBAgIQEAfxBx1tq3jocy9HQG0pRTANBgkqhkiG9w0BAQsFADA0
    MRwwGgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRQwEgYDVQQDDAtyb290QGxh
    bnRlYTAeFw0xNzA3MjUwMzU5NDBaFw0yNzA3MjMwMzU5NDBaMDQxHDAaBgNVBAoT
    E2xpbnV4Y29udGFpbmVycy5vcmcxFDASBgNVBAMMC3Jvb3RAbGFudGVhMIICIjAN
    BgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAw70gyXmdoL0WRgfGcU1PqVNFZ/Nn
    x1u78GEM+yEwSY7d+WS0mUp5AU8olZyEM6ZgNnhgmzO3zajb1dHZVA5y7nnjYulQ
    kn2JgveY0BYzMpU46i/tZvpImDc4BcE3A6xq9WSK2xJCo8M7Kfu0wEuW1rgqB+og
    A+BiWtzpP+QDiUdQeuEqKq2QtKVS1GCR9JAa/gUit47mC62KYyUqmww0P6Ucumeh
    XIfcM+ZCOPf75nHv2j2azjOxKYUpMX7CkiYvyiA0FTr9P9rGQWncju5z676W0NBU
    8hpJ0A9rH/qxJbSDg57rC4oOXbof29k2EmC28SJZ+fW/m4Xd99yfXg3eN7rf72Ra
    gg58/0JXOCN5RS6OKwtoofDEWR+uMUitZPjO0RQzgvagjqV89NwwKk4ErMpZk4+B
    ZjpHzP9v4uaqImUHNQljztEQ6rLfUqueaT2tJ45jPNiedwDVaJtvxH2n4jCID1GA
    83DT/FutEEG/CwKQb0lLf43QZ04OLTgrs7vQej39/xgXe0yhCbD6VgrHnuPDdVXJ
    aI1RzLuhHLfldO5YcaJk7cx3MFi2jvCXJcnGGmbrwIteADobdn5UiyoMoBCG7ZXY
    zWCASEaP3Z7z7rWYqzOHOApqEEi+2lzijma34hsCdAVc90k17k3HSNdkL3Ga3zkF
    KRRRsPIih6DA5IUCAwEAAaNyMHAwDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoG
    CCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwOwYDVR0RBDQwMoIGbGFudGVhhwSsERAv
    hxAgAQRwsPgQFtJQmf/+wpJjhxAgAQRwsPgQFtJQmf/+wpJkMA0GCSqGSIb3DQEB
    CwUAA4ICAQAmi7WoHyWzx2G+WzRK/g8smb0ogLREE199R30a1cnQ8hfXDZw3G3ID
    B+rzbo8cmm97daDj+kFulIfmB+S6znKfyS6wC13AAmqGJvzc9AGI6U1bVoIboP5x
    GADbsWlK9ZCKocilR99XtHEkpvXX/8RGIcdJ3TL2yQWMZJCOWbuwhPkOZB1+lEDJ
    V6nXMcJ14Z9IvYCxnbHYY0S5gYztXhDRaWKnDovvguVB9M3T/qBwXGTDshWCE25D
    8Dgx6F8wL5O4MWisg5AHSWlTmSJp/92Lq9HBtAvteb1dsMzXrzYJEW8Ty+ggVNmR
    k/hNOv/9QCSe6tPoEtCGcxs3uRfPYEMfF71zorGMJ60i6bCOvbkH8Kp3QGYT/zJS
    ouQ3+jX/nK2aPMPXyVJNX4CqQ9PsbZkPgmtm7I/vAvjc2pyzu5Lr7SQVhhsUyMak
    pqkZ1uiLF75oDfYjypyEbyWWlu8xcrXctx+5v8d5aTqR0dtCYLJXDJdsaV3ECdTo
    MDorAyThuKRKPIjGDP5Vv6wWg4CV4Le9rigBwznE460VxMzM9kVVMezOf8Qjwle4
    5km4DQ64cAwIZQzQlAz18opGgLK2E10ullFFwloNX6AYji510egu94DOp9ihi77v
    eJ9jgl00MV58EENCn+VkYXqCEwAaCjp8lRrWqz3L6rQKdTa8sL0Zvg==
    -----END CERTIFICATE-----
  certificate_fingerprint: 2b8068196d29ad77399f1ba9bc7e3bddbe1fa05f47878ca6eec1a4d6b56c671b
  driver: lxc
  driver_version: 2.0.8
  kernel: Linux
  kernel_architecture: x86_64
  kernel_version: 4.4.0-87-generic
  server: lxd
  server_pid: 3355
  server_version: 2.0.10
  storage: dir
  storage_version: ""

root@lantea:~# lxc launch ubuntu:16.04 test
Creating test
Starting test                               

root@lantea:~# lxc config device set test root limits.read 1MB
root@lantea:~# lxc config device set test root limits.write 1MB

root@lantea:~# lxc exec test bash
root@test:~# dd if=/dev/zero of=/root/testfile bs=1M count=10 oflag=direct
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 10.0071 s, 1.0 MB/s
stgraber commented 7 years ago

And checking that the right values are set:

root@lantea:~# cat /sys/fs/cgroup/blkio/lxc/test/blkio.throttle.read_bps_device
252:0 1048576

root@lantea:~# cat /sys/fs/cgroup/blkio/lxc/test/blkio.throttle.write_bps_device
252:0 1048576

root@lantea:~# ls -lh /dev/mapper/test-lxd 
lrwxrwxrwx 1 root root 7 Jul 25 03:59 /dev/mapper/test-lxd -> ../dm-0
root@lantea:~# ls -lh /dev/dm-0 
brw-rw---- 1 root disk 252, 0 Jul 25 03:59 /dev/dm-0
stgraber commented 7 years ago

This was applied live to a running container, so to confirm that liblxc also applies limits properly on clean start:

root@lantea:~# lxc exec test bash
root@test:~# dd if=/dev/zero of=/root/testfile bs=1M count=10 oflag=direct
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 10.0117 s, 1.0 MB/s

root@test:~# exit
root@lantea:~# grep blkio /var/log/lxd/test/lxc.conf
lxc.cgroup.blkio.throttle.read_bps_device = 252:0 1048576
lxc.cgroup.blkio.throttle.write_bps_device = 252:0 1048576
stgraber commented 7 years ago

Going to assume that we somehow fixed this issue with 2.0.10. I don't recall fixing anything specifically for this, but I do remember us fixing an issue related to disk quotas inheritance which be somehow related.

Closing the issue as a result. If someone can reproduce this on either LXD 2.15 or LXD 2.0.10, please give us updated reproduction information and I'll track down what's going on.

sachin-0chain commented 5 years ago

The limits seem to be obeyed for devices that are added as part of the config. If the devices were added via a profile, the read and write limits are not obeyed. I can add more details tomorrow but is there some difference in the way the devices are treated ?

stgraber commented 5 years ago

@sachin-0chain there shouldn't be any difference but maybe you have a local device which is overriding the one from the profile. Does your limit appear in lxc config show --expanded NAME?

sachin-0chain commented 5 years ago

Container s000-lucy settings -

raw.lxc: |-
   lxc.apparmor.profile = unconfined
   lxc.cgroup.devices.allow = a
   lxc.mount.auto=proc:rw sys:rw
   lxc.cap.drop=
   security.nesting: "true"
   security.privileged: "false"

Attached to following RAIDs in RAID0 Configuration

RAID Configuration

df -T /dev/md0 /dev/md1
Filesystem     Type   1K-blocks     Used   Available Use% Mounted on
/dev/md0       xfs   3904611552 13653636  3890957916   1% /ebs
/dev/md1       xfs  35154172928 37375356 35116797572   1% /efs

Partition Information

cat /proc/partitions 
major minor  #blocks  name
   8       16  976762584 sdb
   8       48  976762584 sdd
   8       80  976762584 sdf
   8      112  976762584 sdh
   8       32 11718885376 sdc
   8       64 11718885376 sde
   8       96 11718885376 sdg
   9        0 3906521088 md0 .  <<<
   9        1 35156259840 md1 . <<<

Raw B/W from the host...

For /dev/md0 - 9:0
sudo dd if=/dev/zero of=/ebs/testfile bs=1M count=1000 oflag=direct
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.70381 s, 615 MB/s

For /dev/md1 - 9:1
sudo dd if=/dev/zero of=/efs/testfile bs=1M count=1000 oflag=direct
1048576000 bytes (1.0 GB, 1000 MiB) copied, 2.00333 s, 523 MB/s

Expanded Configuration Information

ubuntu@pedro:/ebs$ lxc config show s000-lucy --expanded
architecture: x86_64
config:
  limits.cpu: 4,5,6,7
  limits.disk.priority: "8"
  limits.memory: 8GB
  raw.lxc: |-
    lxc.apparmor.profile = unconfined
    lxc.cgroup.devices.allow = a
    lxc.mount.auto=proc:rw sys:rw
    lxc.cap.drop=
  security.nesting: "true"
  security.privileged: "false"
  volatile.base_image: a6f65f3ee8af61a7a06f19fffed12467b22e755281762a59dc40c9febd949a8a
  volatile.eth0.hwaddr: 00:16:3e:ed:01:9a
  volatile.eth0.name: eth0
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.power: RUNNING
devices:
  disk00:
    limits.read: 100MB
    limits.write: 100MB
    path: /disk00
    source: /ebs/lucy.s000.disk00
    type: disk
  disk10:
    limits.read: 100MB
    limits.write: 100MB
    path: /disk10
    source: /ebs/lucy.s000.disk10
    type: disk
  disk11:
    limits.read: 100MB
    limits.write: 100MB
    path: /disk11
    source: /ebs/lucy.s000.disk11
    type: disk
  disk20:
    limits.read: 100MB
    limits.write: 100MB
    path: /disk20
    source: /efs/lucy.s000.disk20
    type: disk
  docker:
    limits.read: 100MB
    limits.write: 100MB
    path: /var/lib/docker
    source: /ebs/lucy.s000.docker
    type: disk
  eth0:
    nictype: bridged
    parent: br0
    type: nic
  root:
    limits.read: 100MB
    limits.write: 100MB
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- bridge
- s000-lucy
stateful: false
description: ""

BLKIO OUTPUT - Maps correctly per request for 100MB

ubuntu@pedro:/ebs$ cat /sys/fs/cgroup/blkio/lxc/s000-lucy/blkio.throttle.write_bps_device
9:1 104857600
9:0 104857600
ubuntu@pedro:/ebs$ cat /sys/fs/cgroup/blkio/lxc/s000-lucy/blkio.throttle.read_bps_device
9:1 104857600
9:0 104857600

Configuration Information s000-lucy

cat /var/log/lxd/s000-lucy/lxc.conf
lxc.log.file = /var/log/lxd/s000-lucy/lxc.log
lxc.log.level = info
lxc.console.buffer.size = auto
lxc.console.size = auto
lxc.console.logfile = /var/log/lxd/s000-lucy/console.log
lxc.mount.auto = proc:rw sys:rw
lxc.autodev = 1
lxc.pty.max = 1024
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file,optional
lxc.mount.entry = /dev/net/tun dev/net/tun none bind,create=file,optional
lxc.mount.entry = /proc/sys/fs/binfmt_misc proc/sys/fs/binfmt_misc none rbind,create=dir,optional
lxc.mount.entry = /sys/firmware/efi/efivars sys/firmware/efi/efivars none rbind,create=dir,optional
lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none rbind,create=dir,optional
lxc.mount.entry = /sys/fs/pstore sys/fs/pstore none rbind,create=dir,optional
lxc.mount.entry = /sys/kernel/debug sys/kernel/debug none rbind,create=dir,optional
lxc.mount.entry = /sys/kernel/security sys/kernel/security none rbind,create=dir,optional
lxc.mount.entry = /dev/mqueue dev/mqueue none rbind,create=dir,optional
lxc.include = /usr/share/lxc/config/common.conf.d/
lxc.mount.entry = proc dev/.lxc/proc proc create=dir,optional
lxc.mount.entry = sys dev/.lxc/sys sysfs create=dir,optional
lxc.arch = linux64
lxc.hook.pre-start = /usr/lib/lxd/lxd callhook /var/lib/lxd 25 start
lxc.hook.post-stop = /usr/lib/lxd/lxd callhook /var/lib/lxd 25 stop
lxc.tty.max = 0
lxc.uts.name = s000-lucy
lxc.mount.entry = /var/lib/lxd/devlxd dev/lxd none bind,create=dir 0 0
lxc.apparmor.profile = lxd-s000-lucy_</var/lib/lxd>//&:lxd-s000-lucy_<var-lib-lxd>:
lxc.seccomp.profile = /var/lib/lxd/security/seccomp/s000-lucy
lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536
lxc.cgroup.memory.limit_in_bytes = 8589934592
lxc.cgroup.memory.soft_limit_in_bytes = 7730941133
lxc.cgroup.blkio.weight = 800
lxc.cgroup.blkio.throttle.read_bps_device = 9:0 104857600
lxc.cgroup.blkio.throttle.write_bps_device = 9:0 104857600
lxc.cgroup.blkio.throttle.read_bps_device = 9:1 104857600
lxc.cgroup.blkio.throttle.write_bps_device = 9:1 104857600
lxc.rootfs.path = dir:/var/lib/lxd/containers/s000-lucy/rootfs
lxc.mount.entry = /var/lib/lxd/devices/s000-lucy/disk.disk00.disk00 disk00 none bind,create=dir
lxc.mount.entry = /var/lib/lxd/devices/s000-lucy/disk.disk10.disk10 disk10 none bind,create=dir
lxc.mount.entry = /var/lib/lxd/devices/s000-lucy/disk.disk11.disk11 disk11 none bind,create=dir
lxc.mount.entry = /var/lib/lxd/devices/s000-lucy/disk.disk20.disk20 disk20 none bind,create=dir
lxc.mount.entry = /var/lib/lxd/devices/s000-lucy/disk.docker.var-lib-docker var/lib/docker none bind,create=dir
lxc.net.0.type = veth
lxc.net.0.flags = up
lxc.net.0.link = br0
lxc.net.0.hwaddr = 00:16:3e:ed:01:9a
lxc.net.0.name = eth0
lxc.mount.entry = /var/lib/lxd/shmounts/s000-lucy dev/.lxd-mounts none bind,create=dir 0 0
lxc.apparmor.profile = unconfined
lxc.cgroup.devices.allow = a
lxc.mount.auto=proc:rw sys:rw
lxc.cap.drop=

Bandwidth Test in the container s000-lucy


ubuntu@s000-lucy:/disk10$ dd if=/dev/zero of=/disk00/testfile bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.63689 s, 641 MB/s

ubuntu@s000-lucy:/disk10$ dd if=/dev/zero of=/disk10/testfile bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.64203 s, 639 MB/s

ubuntu@s000-lucy:/disk10$ dd if=/dev/zero of=/disk11/testfile bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.64249 s, 638 MB/s

ubuntu@s000-lucy:/disk10$ dd if=/dev/zero of=/disk20/testfile bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.90127 s, 552 MB/s

ubuntu@s000-lucy:/disk10$ dd if=/dev/zero of=/var/lib/docker/testfile bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.66602 s, 629 MB/s
stgraber commented 5 years ago

Ok, so looks like the blkio cgroup was properly configured in this case with the limits you've set?

Still pretty odd that the limit then doesn't appear to be effective. What filesystem and storage backend are you using on that? Some unfortunately do their own I/O outside of a codepath handled by blkio (ZFS is one of those).

sachin-0chain commented 5 years ago

@stgraber - raid using mdadm / file-system is xfs / There was a recent fix you did sometime back for intermittent blkio not getting recognized for root disk / I am now trying upgrade lxd to 3.9 and try the latest / Any suggestions ?

sachin-0chain commented 5 years ago

@stgraber - Adding devices using the config method obeys the blkio limits...

Moved to 3.9 Version

Created test from ubuntu:16.04

Added two drives d1 and d2 using the following commands...

root@pedro:~# lxc config device add test d1 disk path=/d1 source=/ebs/d1
root@pedro:~# lxc config device set test d1 limits.read 20MB
root@pedro:~# lxc config device set test d1 limits.write 20MB

root@pedro:~# lxc config device add test d2 disk path=/d2 source=/efs/d2

Run the Tests

root@test:/d1# dd if=/dev/zero of=/d2/testfile bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 2.11254 s, 496 MB/s . <<<<<<
root@test:/d1# dd if=/dev/zero of=/d1/testfile bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 52.4319 s, 20.0 MB/s <<<<
sachin-0chain commented 5 years ago

Quota is honored when exec method to login is used...

root@pedro:~# lxc exec s000-lucy -- bash
root@s000-lucy:~# dd if=/dev/zero of=/disk10/testfile bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 10.4216 s, 101 MB/s
root@s000-lucy:~#

Quote doesn't work when ssh into the container and run under sudo...

ubuntu@s000-lucy:~$ sudo su -
root@s000-lucy:~# dd if=/dev/zero of=/disk10/testfile bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.67755 s, 625 MB/s
stgraber commented 5 years ago

Ah, that part does make some sense because of how cgroups work, especially the way systemd sets them up.

Basically LXD updates the root cgroup with the new limits, some controllers will do the right thing and reduce children cgroups to have no more access than their parent, blkio isn't one of those.

So in your example above, the lxc exec gets you a shell in the root cgroup which has the right limit whereas the ssh + sudo method would get you a shell through logind which would be in a child cgroup.

Restarting the container will make things consistent again.

The fact that this particular controller doesn't appear to be properly hierarchical is a bit frustrating and hopefully something that cgroup2 does better...

sachin-0chain commented 5 years ago

@stgraber - Restarting the container doesn't enforce the 'blkio' limits via sshd. exec still works. I have tried rebooting and restarting.

The host needs to control the compute, network, disk io and priority of several 16.04 Ubuntu based containers. There are several SSD attached in RAID0 formation to be run under Docker - Scylladb. Hence I need fine grain control over the b/w and disk to benchmark the H/W.

Do you know if I could use the 'parent_cgroup' feature of Docker and pass the LXC cgroup. Will the limits be enforced ?

Are there other cgroups which exhibit similar behaviour to blkio that I need to be aware ?

Are there any cgroup settings here which I can play around with ?

   config:
      limits.cpu: "{{agent_cpus}}"
      limits.memory: "{{agent_memory}}"
      limits.disk.priority: "{{lxc_disk_priority |int}}"
      raw.lxc: |-
        lxc.apparmor.profile = unconfined
        lxc.cgroup.devices.allow = a
        lxc.mount.auto=proc:rw sys:rw
        lxc.cap.drop=
      security.nesting: "true"
      security.privileged: "true"
      user.network_mode: link-local
    description: "{{network_name}}.{{agent_name}}"

I am willing to make the HOST/GUEST same release if this solves the problem, but I would prefer to use 16.04 due to its stability...

HOST:

Distributor ID: Ubuntu
Description:    Ubuntu 18.04.1 LTS
Release:    18.04
Codename:   bionic

Guest LXC:

root@s000-paula:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.5 LTS
Release:    16.04
Codename:   xenial