CachyOS / CachyOS-Settings

Settings used for CachyOS
GNU General Public License v3.0
105 stars 24 forks source link

Udev ioschedulers rules probably bad for modern USB SSDs. #50

Closed Nyanraltotlapun closed 9 months ago

Nyanraltotlapun commented 10 months ago

The problem.

File etc/udev/rules.d/60-ioschedulers.rules contains rule: ACTION=="add|change", KERNEL=="sd[a-z]*|mmcblk[0-9]*", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="bfq" That sets bfq scheduler for all USB SATA and (SD card?) flash memory storage, which is probably unoptimal for modern USB enclosures, external USB SSDs, and some modern USB Falsh Drives (that basically is a small SSD)

What is happening.

For example, I have SSD inside USB-SATA external enclosure. With this UDEV rule active, when I call cryptsetup to open it, SSD became busy(flashing activity light) for more than a minute, for this time command hangs. While without this rule, crypsetup - executes instantly, and device is ready to use.

Possible fix.

Fast USB flash storage's use USB to SCSI protocol. So probably rule can be modified to ignore SCSI non rotational devices. Or some rule may additionally be applied to set non rotational SCSI devices to default scheduler.

Notes.

Do we even need this rule for SATA SSDs? I got the impression that this can benefit only slow storage's, like USB Flash Drives(dongles), SD Memory Cards, and HDDs.

ventureoo commented 10 months ago

Hi. Thanks for reporting. Can you please provide the output of this command, replacing /dev/sda with the block device corresponding to your disk (can be seen via lsblk):

udevadm info --attribute-walk --path=$(udevadm info --query=path --name=/dev/sda ) | paste-cachyos

This will help in fixing the rules for USB disks.

Do we even need this rule for SATA SSDs? I got the impression that this can benefit only slow storage's, like USB Flash Drives(dongles), SD Memory Cards, and HDDs.

I'm definitely not sure if it makes sense to use BFQ for SD cards. As for SATA SSDs, in most cases bfq, should be fine. However, mq-deadline may give more assurance of write/read latency.

Nyanraltotlapun commented 10 months ago

Can you please provide the output of this command, replacing /dev/sda with the block device corresponding to your disk (can be seen via lsblk):

udevadm info --attribute-walk --path=$(udevadm info --query=path --name=/dev/sda ) | paste-cachyos

https://paste.cachyos.org/p/9b01ea6

ventureoo commented 10 months ago

https://github.com/CachyOS/CachyOS-Settings/commit/e9f1fbfaf03798b1814847cc68693ba2b48b214b commit should fix it. Check it out, if that doesn't help, feel free to reopen.

Nyanraltotlapun commented 10 months ago

Ok, I just noticed that report shows ATTR{queue/rotational}=="1", so I messed up a little.

BFQ scheduler is activated by the first rule ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"

I rechecked, so everything else remains true.

I will make one more test with actual HDD in the enclosure, to see if this affect HDDs in the similar manner.

Nyanraltotlapun commented 10 months ago

I tested with HDD inside - and it seems that there is no slowdowns with BFQ. So it looks the same as for mq-deadline. If it can help, this is the report for HDD inside USB enclosure - https://paste.cachyos.org/p/b8e21a3

ptr1337 commented 10 months ago

We could also think about going this way:

# BFQ is recommended for slow storage such as rotational block devices and SD cards.
ACTION=="add|change", SUBSYSTEM=="block", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="mmcblk?", ATTR{queue/scheduler}="bfq"
# None is recommended for nvme'S
ACTION=="add|change", SUBSYSTEM=="block", ATTR{queue/rotational}=="0", KERNEL=="nvme?n?", ATTR{queue/scheduler}="none"
# Kyber is recommended for SATA SSDs
ACTION=="add|change", SUBSYSTEM=="block", ATTR{queue/rotational}=="0", KERNEL=="sd?", ATTR{queue/scheduler}="kyber"

WDYT?

Nyanraltotlapun commented 10 months ago

It is still probably good Idea to separate SATA SSDs from USB devices. So, is this looks reasonable? I tried to rewrite it in original config style. Basically: for HDDs SD cards and USB storage we set bfq for SATA SSDs - kyber for NVME - none

# BFQ is recommended for slow storage such as rotational block devices and SD cards.
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="0", SUBSYSTEMS=="usb", ATTR{queue/scheduler}="bfq"
ACTION=="add|change", KERNEL=="mmcblk[0-9]*", ATTR{queue/scheduler}="bfq"

# None is recommended for nvme'S
ACTION=="add|change", KERNEL=="nvme[0-9]*", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="none"

# Kyber is recommended for SATA SSDs
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="0", SUBSYSTEMS!="usb", ATTR{queue/scheduler}="kyber"
ventureoo commented 10 months ago

To be honest, I'm really not sure if we should be using BFQ for SD cards. I ran a small benchmark on my single board computer with SD from Samsung (ED2S5 on 128Gb) using Fio on Ubutunu 20.04 testing Kyber, mq-deadline and BFQ and I've noticed that BFQ is causing a drop in throughput compared to others. The command for the test was as follows:

fio --filename=/mnt/test.fio --size=8GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1

Results for BFQ:

iops-test-job: (groupid=0, jobs=4): err= 0: pid=21296: Wed Jan 24 13:56:04 2024
  read: IOPS=222, BW=888KiB/s (909kB/s)(105MiB/120838msec)
    slat (usec): min=19, max=155579, avg=91.54, stdev=2370.92
    clat (msec): min=571, max=6095, avg=2255.86, stdev=720.80
     lat (msec): min=571, max=6095, avg=2255.96, stdev=720.80
    clat percentiles (msec):
     |  1.00th=[  978],  5.00th=[ 1284], 10.00th=[ 1435], 20.00th=[ 1653],
     | 30.00th=[ 1838], 40.00th=[ 1989], 50.00th=[ 2165], 60.00th=[ 2333],
     | 70.00th=[ 2534], 80.00th=[ 2802], 90.00th=[ 3239], 95.00th=[ 3608],
     | 99.00th=[ 4329], 99.50th=[ 4732], 99.90th=[ 5671], 99.95th=[ 5940],
     | 99.99th=[ 6074]
   bw (  KiB/s): min=   32, max= 2846, per=100.00%, avg=930.63, stdev=136.07, samples=905
   iops        : min=    8, max=  711, avg=232.46, stdev=34.02, samples=905
  write: IOPS=225, BW=901KiB/s (923kB/s)(106MiB/120838msec); 0 zone resets
    slat (usec): min=19, max=1533.3k, avg=17595.62, stdev=68751.02
    clat (msec): min=132, max=5966, avg=2261.95, stdev=721.47
     lat (msec): min=571, max=6128, avg=2279.54, stdev=725.93
    clat percentiles (msec):
     |  1.00th=[  978],  5.00th=[ 1284], 10.00th=[ 1452], 20.00th=[ 1653],
     | 30.00th=[ 1838], 40.00th=[ 1989], 50.00th=[ 2165], 60.00th=[ 2333],
     | 70.00th=[ 2534], 80.00th=[ 2802], 90.00th=[ 3239], 95.00th=[ 3608],
     | 99.00th=[ 4329], 99.50th=[ 4799], 99.90th=[ 5738], 99.95th=[ 5873],
     | 99.99th=[ 5940]
   bw (  KiB/s): min=   32, max= 2974, per=100.00%, avg=949.63, stdev=136.37, samples=900
   iops        : min=    8, max=  743, avg=237.21, stdev=34.10, samples=900
  lat (msec)   : 250=0.01%, 500=0.01%, 750=0.16%, 1000=1.06%, 2000=39.82%
  lat (msec)   : >=2000=58.95%
  cpu          : usr=0.13%, sys=0.45%, ctx=54696, majf=0, minf=169
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=26831,27217,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=888KiB/s (909kB/s), 888KiB/s-888KiB/s (909kB/s-909kB/s), io=105MiB (110MB), run=120838-120838msec
  WRITE: bw=901KiB/s (923kB/s), 901KiB/s-901KiB/s (923kB/s-923kB/s), io=106MiB (111MB), run=120838-120838msec

Disk stats (read/write):
  mmcblk0: ios=26913/27259, merge=7/25, ticks=4311670/4390411, in_queue=8702082, util=99.96%

Results for mq-deadline:

iops-test-job: (groupid=0, jobs=4): err= 0: pid=21171: Wed Jan 24 13:42:13 2024
  read: IOPS=245, BW=982KiB/s (1006kB/s)(115MiB/120306msec)
    slat (usec): min=16, max=926109, avg=7928.41, stdev=25924.38
    clat (msec): min=264, max=3025, avg=1929.42, stdev=263.05
     lat (msec): min=319, max=3042, avg=1937.35, stdev=264.20
    clat percentiles (msec):
     |  1.00th=[ 1301],  5.00th=[ 1720], 10.00th=[ 1754], 20.00th=[ 1787],
     | 30.00th=[ 1821], 40.00th=[ 1838], 50.00th=[ 1871], 60.00th=[ 1888],
     | 70.00th=[ 1921], 80.00th=[ 2005], 90.00th=[ 2366], 95.00th=[ 2534],
     | 99.00th=[ 2735], 99.50th=[ 2802], 99.90th=[ 2869], 99.95th=[ 2903],
     | 99.99th=[ 2970]
   bw (  KiB/s): min=   40, max= 1823, per=99.84%, avg=980.44, stdev=64.18, samples=948
   iops        : min=   10, max=  455, avg=244.94, stdev=16.05, samples=948
  write: IOPS=248, BW=993KiB/s (1017kB/s)(117MiB/120306msec); 0 zone resets
    slat (usec): min=18, max=636299, avg=8209.25, stdev=25919.45
    clat (msec): min=285, max=5466, avg=2174.75, stdev=353.24
     lat (msec): min=319, max=5466, avg=2182.96, stdev=354.07
    clat percentiles (msec):
     |  1.00th=[ 1418],  5.00th=[ 1804], 10.00th=[ 1854], 20.00th=[ 1921],
     | 30.00th=[ 1989], 40.00th=[ 2056], 50.00th=[ 2123], 60.00th=[ 2198],
     | 70.00th=[ 2265], 80.00th=[ 2366], 90.00th=[ 2668], 95.00th=[ 2869],
     | 99.00th=[ 3171], 99.50th=[ 3339], 99.90th=[ 4212], 99.95th=[ 4665],
     | 99.99th=[ 5470]
   bw (  KiB/s): min=   56, max= 1848, per=100.00%, avg=993.14, stdev=63.26, samples=945
   iops        : min=   14, max=  462, avg=248.12, stdev=15.81, samples=945
  lat (msec)   : 500=0.15%, 750=0.24%, 1000=0.26%, 2000=55.06%, >=2000=44.29%
  cpu          : usr=0.15%, sys=0.35%, ctx=8625, majf=11, minf=69
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.6%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=29538,29873,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=982KiB/s (1006kB/s), 982KiB/s-982KiB/s (1006kB/s-1006kB/s), io=115MiB (121MB), run=120306-120306msec
  WRITE: bw=993KiB/s (1017kB/s), 993KiB/s-993KiB/s (1017kB/s-1017kB/s), io=117MiB (122MB), run=120306-120306msec

Disk stats (read/write):
  mmcblk0: ios=29803/30142, merge=26/56, ticks=3594629/11079894, in_queue=14674523, util=99.97%

Results for Kyber:

iops-test-job: (groupid=0, jobs=4): err= 0: pid=21850: Wed Jan 24 17:47:36 2024
  read: IOPS=246, BW=986KiB/s (1010kB/s)(116MiB/120348msec)
    slat (usec): min=14, max=529087, avg=7978.71, stdev=24366.32
    clat (msec): min=154, max=2653, avg=1912.96, stdev=221.23
     lat (msec): min=239, max=2720, avg=1920.94, stdev=222.00
    clat percentiles (msec):
     |  1.00th=[ 1351],  5.00th=[ 1737], 10.00th=[ 1770], 20.00th=[ 1804],
     | 30.00th=[ 1838], 40.00th=[ 1854], 50.00th=[ 1871], 60.00th=[ 1888],
     | 70.00th=[ 1921], 80.00th=[ 1955], 90.00th=[ 2198], 95.00th=[ 2467],
     | 99.00th=[ 2534], 99.50th=[ 2567], 99.90th=[ 2601], 99.95th=[ 2635],
     | 99.99th=[ 2668]
   bw (  KiB/s): min=  111, max= 1712, per=100.00%, avg=985.29, stdev=58.95, samples=947
   iops        : min=   27, max=  428, avg=246.12, stdev=14.74, samples=947
  write: IOPS=249, BW=997KiB/s (1021kB/s)(117MiB/120348msec); 0 zone resets
    slat (usec): min=15, max=570626, avg=8094.89, stdev=24622.01
    clat (msec): min=239, max=2969, avg=2174.01, stdev=244.24
     lat (msec): min=239, max=2969, avg=2182.11, stdev=244.89
    clat percentiles (msec):
     |  1.00th=[ 1502],  5.00th=[ 1972], 10.00th=[ 2022], 20.00th=[ 2056],
     | 30.00th=[ 2089], 40.00th=[ 2106], 50.00th=[ 2140], 60.00th=[ 2165],
     | 70.00th=[ 2165], 80.00th=[ 2232], 90.00th=[ 2534], 95.00th=[ 2735],
     | 99.00th=[ 2802], 99.50th=[ 2836], 99.90th=[ 2869], 99.95th=[ 2903],
     | 99.99th=[ 2937]
   bw (  KiB/s): min=   46, max= 1624, per=99.76%, avg=994.60, stdev=59.88, samples=948
   iops        : min=   10, max=  406, avg=248.45, stdev=14.97, samples=948
  lat (msec)   : 250=0.02%, 500=0.15%, 750=0.21%, 1000=0.21%, 2000=45.58%
  lat (msec)   : >=2000=53.82%
  cpu          : usr=0.13%, sys=0.38%, ctx=8379, majf=0, minf=189
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.6%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=29663,30000,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=986KiB/s (1010kB/s), 986KiB/s-986KiB/s (1010kB/s-1010kB/s), io=116MiB (121MB), run=120348-120348msec
  WRITE: bw=997KiB/s (1021kB/s), 997KiB/s-997KiB/s (1021kB/s-1021kB/s), io=117MiB (123MB), run=120348-120348msec

Disk stats (read/write):
  mmcblk0: ios=29806/30038, merge=0/25, ticks=3393864/11413451, in_queue=14807315, util=99.95%

I did a couple more runs and the tendency continued (all runs can be found here: https://paste.cachyos.org/p/0c5befb). BFQ on average was losing by ~5-10% in bandwidth. At the same time Kyber was generally keeping up with mq-deadline.

Maybe it's just my setup issues, but I wouldn't really want to enable BFQ for SD cards by default.

Nyanraltotlapun commented 10 months ago

Maybe it's just my setup issues, but I wouldn't really want to enable BFQ for SD cards by default.

Ok. I made some tests. And see no difference regarding caching behavior.

So I agree that mq-deadline is probably better, because, it is groping write requests, what can be beneficial to SD cards and similar USB flash storage's...

So. How about this?

# BFQ is recommended for slow storage such as rotational block devices.
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"

# None is recommended for nvme drives
ACTION=="add|change", KERNEL=="nvme[0-9]*", ATTR{queue/scheduler}="none"

# Kyber is recommended for SATA SSDs
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="0", SUBSYSTEMS!="usb", ATTR{queue/scheduler}="kyber"
ventureoo commented 9 months ago

https://github.com/CachyOS/CachyOS-Settings/pull/52 has to fix this issue, but feel free to report about any regressions.