Closed Nyanraltotlapun closed 9 months ago
Hi. Thanks for reporting. Can you please provide the output of this command, replacing /dev/sda with the block device corresponding to your disk (can be seen via lsblk):
udevadm info --attribute-walk --path=$(udevadm info --query=path --name=/dev/sda ) | paste-cachyos
This will help in fixing the rules for USB disks.
Do we even need this rule for SATA SSDs? I got the impression that this can benefit only slow storage's, like USB Flash Drives(dongles), SD Memory Cards, and HDDs.
I'm definitely not sure if it makes sense to use BFQ for SD cards. As for SATA SSDs, in most cases bfq, should be fine. However, mq-deadline may give more assurance of write/read latency.
Can you please provide the output of this command, replacing /dev/sda with the block device corresponding to your disk (can be seen via lsblk):
udevadm info --attribute-walk --path=$(udevadm info --query=path --name=/dev/sda ) | paste-cachyos
https://github.com/CachyOS/CachyOS-Settings/commit/e9f1fbfaf03798b1814847cc68693ba2b48b214b commit should fix it. Check it out, if that doesn't help, feel free to reopen.
Ok, I just noticed that report shows ATTR{queue/rotational}=="1", so I messed up a little.
BFQ scheduler is activated by the first rule ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
I rechecked, so everything else remains true.
I will make one more test with actual HDD in the enclosure, to see if this affect HDDs in the similar manner.
I tested with HDD inside - and it seems that there is no slowdowns with BFQ. So it looks the same as for mq-deadline. If it can help, this is the report for HDD inside USB enclosure - https://paste.cachyos.org/p/b8e21a3
We could also think about going this way:
# BFQ is recommended for slow storage such as rotational block devices and SD cards.
ACTION=="add|change", SUBSYSTEM=="block", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="mmcblk?", ATTR{queue/scheduler}="bfq"
# None is recommended for nvme'S
ACTION=="add|change", SUBSYSTEM=="block", ATTR{queue/rotational}=="0", KERNEL=="nvme?n?", ATTR{queue/scheduler}="none"
# Kyber is recommended for SATA SSDs
ACTION=="add|change", SUBSYSTEM=="block", ATTR{queue/rotational}=="0", KERNEL=="sd?", ATTR{queue/scheduler}="kyber"
WDYT?
It is still probably good Idea to separate SATA SSDs from USB devices. So, is this looks reasonable? I tried to rewrite it in original config style. Basically: for HDDs SD cards and USB storage we set bfq for SATA SSDs - kyber for NVME - none
# BFQ is recommended for slow storage such as rotational block devices and SD cards.
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="0", SUBSYSTEMS=="usb", ATTR{queue/scheduler}="bfq"
ACTION=="add|change", KERNEL=="mmcblk[0-9]*", ATTR{queue/scheduler}="bfq"
# None is recommended for nvme'S
ACTION=="add|change", KERNEL=="nvme[0-9]*", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="none"
# Kyber is recommended for SATA SSDs
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="0", SUBSYSTEMS!="usb", ATTR{queue/scheduler}="kyber"
To be honest, I'm really not sure if we should be using BFQ for SD cards. I ran a small benchmark on my single board computer with SD from Samsung (ED2S5 on 128Gb) using Fio on Ubutunu 20.04 testing Kyber, mq-deadline and BFQ and I've noticed that BFQ is causing a drop in throughput compared to others. The command for the test was as follows:
fio --filename=/mnt/test.fio --size=8GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1
Results for BFQ:
iops-test-job: (groupid=0, jobs=4): err= 0: pid=21296: Wed Jan 24 13:56:04 2024
read: IOPS=222, BW=888KiB/s (909kB/s)(105MiB/120838msec)
slat (usec): min=19, max=155579, avg=91.54, stdev=2370.92
clat (msec): min=571, max=6095, avg=2255.86, stdev=720.80
lat (msec): min=571, max=6095, avg=2255.96, stdev=720.80
clat percentiles (msec):
| 1.00th=[ 978], 5.00th=[ 1284], 10.00th=[ 1435], 20.00th=[ 1653],
| 30.00th=[ 1838], 40.00th=[ 1989], 50.00th=[ 2165], 60.00th=[ 2333],
| 70.00th=[ 2534], 80.00th=[ 2802], 90.00th=[ 3239], 95.00th=[ 3608],
| 99.00th=[ 4329], 99.50th=[ 4732], 99.90th=[ 5671], 99.95th=[ 5940],
| 99.99th=[ 6074]
bw ( KiB/s): min= 32, max= 2846, per=100.00%, avg=930.63, stdev=136.07, samples=905
iops : min= 8, max= 711, avg=232.46, stdev=34.02, samples=905
write: IOPS=225, BW=901KiB/s (923kB/s)(106MiB/120838msec); 0 zone resets
slat (usec): min=19, max=1533.3k, avg=17595.62, stdev=68751.02
clat (msec): min=132, max=5966, avg=2261.95, stdev=721.47
lat (msec): min=571, max=6128, avg=2279.54, stdev=725.93
clat percentiles (msec):
| 1.00th=[ 978], 5.00th=[ 1284], 10.00th=[ 1452], 20.00th=[ 1653],
| 30.00th=[ 1838], 40.00th=[ 1989], 50.00th=[ 2165], 60.00th=[ 2333],
| 70.00th=[ 2534], 80.00th=[ 2802], 90.00th=[ 3239], 95.00th=[ 3608],
| 99.00th=[ 4329], 99.50th=[ 4799], 99.90th=[ 5738], 99.95th=[ 5873],
| 99.99th=[ 5940]
bw ( KiB/s): min= 32, max= 2974, per=100.00%, avg=949.63, stdev=136.37, samples=900
iops : min= 8, max= 743, avg=237.21, stdev=34.10, samples=900
lat (msec) : 250=0.01%, 500=0.01%, 750=0.16%, 1000=1.06%, 2000=39.82%
lat (msec) : >=2000=58.95%
cpu : usr=0.13%, sys=0.45%, ctx=54696, majf=0, minf=169
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.5%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=26831,27217,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
READ: bw=888KiB/s (909kB/s), 888KiB/s-888KiB/s (909kB/s-909kB/s), io=105MiB (110MB), run=120838-120838msec
WRITE: bw=901KiB/s (923kB/s), 901KiB/s-901KiB/s (923kB/s-923kB/s), io=106MiB (111MB), run=120838-120838msec
Disk stats (read/write):
mmcblk0: ios=26913/27259, merge=7/25, ticks=4311670/4390411, in_queue=8702082, util=99.96%
Results for mq-deadline:
iops-test-job: (groupid=0, jobs=4): err= 0: pid=21171: Wed Jan 24 13:42:13 2024
read: IOPS=245, BW=982KiB/s (1006kB/s)(115MiB/120306msec)
slat (usec): min=16, max=926109, avg=7928.41, stdev=25924.38
clat (msec): min=264, max=3025, avg=1929.42, stdev=263.05
lat (msec): min=319, max=3042, avg=1937.35, stdev=264.20
clat percentiles (msec):
| 1.00th=[ 1301], 5.00th=[ 1720], 10.00th=[ 1754], 20.00th=[ 1787],
| 30.00th=[ 1821], 40.00th=[ 1838], 50.00th=[ 1871], 60.00th=[ 1888],
| 70.00th=[ 1921], 80.00th=[ 2005], 90.00th=[ 2366], 95.00th=[ 2534],
| 99.00th=[ 2735], 99.50th=[ 2802], 99.90th=[ 2869], 99.95th=[ 2903],
| 99.99th=[ 2970]
bw ( KiB/s): min= 40, max= 1823, per=99.84%, avg=980.44, stdev=64.18, samples=948
iops : min= 10, max= 455, avg=244.94, stdev=16.05, samples=948
write: IOPS=248, BW=993KiB/s (1017kB/s)(117MiB/120306msec); 0 zone resets
slat (usec): min=18, max=636299, avg=8209.25, stdev=25919.45
clat (msec): min=285, max=5466, avg=2174.75, stdev=353.24
lat (msec): min=319, max=5466, avg=2182.96, stdev=354.07
clat percentiles (msec):
| 1.00th=[ 1418], 5.00th=[ 1804], 10.00th=[ 1854], 20.00th=[ 1921],
| 30.00th=[ 1989], 40.00th=[ 2056], 50.00th=[ 2123], 60.00th=[ 2198],
| 70.00th=[ 2265], 80.00th=[ 2366], 90.00th=[ 2668], 95.00th=[ 2869],
| 99.00th=[ 3171], 99.50th=[ 3339], 99.90th=[ 4212], 99.95th=[ 4665],
| 99.99th=[ 5470]
bw ( KiB/s): min= 56, max= 1848, per=100.00%, avg=993.14, stdev=63.26, samples=945
iops : min= 14, max= 462, avg=248.12, stdev=15.81, samples=945
lat (msec) : 500=0.15%, 750=0.24%, 1000=0.26%, 2000=55.06%, >=2000=44.29%
cpu : usr=0.15%, sys=0.35%, ctx=8625, majf=11, minf=69
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.6%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=29538,29873,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
READ: bw=982KiB/s (1006kB/s), 982KiB/s-982KiB/s (1006kB/s-1006kB/s), io=115MiB (121MB), run=120306-120306msec
WRITE: bw=993KiB/s (1017kB/s), 993KiB/s-993KiB/s (1017kB/s-1017kB/s), io=117MiB (122MB), run=120306-120306msec
Disk stats (read/write):
mmcblk0: ios=29803/30142, merge=26/56, ticks=3594629/11079894, in_queue=14674523, util=99.97%
Results for Kyber:
iops-test-job: (groupid=0, jobs=4): err= 0: pid=21850: Wed Jan 24 17:47:36 2024
read: IOPS=246, BW=986KiB/s (1010kB/s)(116MiB/120348msec)
slat (usec): min=14, max=529087, avg=7978.71, stdev=24366.32
clat (msec): min=154, max=2653, avg=1912.96, stdev=221.23
lat (msec): min=239, max=2720, avg=1920.94, stdev=222.00
clat percentiles (msec):
| 1.00th=[ 1351], 5.00th=[ 1737], 10.00th=[ 1770], 20.00th=[ 1804],
| 30.00th=[ 1838], 40.00th=[ 1854], 50.00th=[ 1871], 60.00th=[ 1888],
| 70.00th=[ 1921], 80.00th=[ 1955], 90.00th=[ 2198], 95.00th=[ 2467],
| 99.00th=[ 2534], 99.50th=[ 2567], 99.90th=[ 2601], 99.95th=[ 2635],
| 99.99th=[ 2668]
bw ( KiB/s): min= 111, max= 1712, per=100.00%, avg=985.29, stdev=58.95, samples=947
iops : min= 27, max= 428, avg=246.12, stdev=14.74, samples=947
write: IOPS=249, BW=997KiB/s (1021kB/s)(117MiB/120348msec); 0 zone resets
slat (usec): min=15, max=570626, avg=8094.89, stdev=24622.01
clat (msec): min=239, max=2969, avg=2174.01, stdev=244.24
lat (msec): min=239, max=2969, avg=2182.11, stdev=244.89
clat percentiles (msec):
| 1.00th=[ 1502], 5.00th=[ 1972], 10.00th=[ 2022], 20.00th=[ 2056],
| 30.00th=[ 2089], 40.00th=[ 2106], 50.00th=[ 2140], 60.00th=[ 2165],
| 70.00th=[ 2165], 80.00th=[ 2232], 90.00th=[ 2534], 95.00th=[ 2735],
| 99.00th=[ 2802], 99.50th=[ 2836], 99.90th=[ 2869], 99.95th=[ 2903],
| 99.99th=[ 2937]
bw ( KiB/s): min= 46, max= 1624, per=99.76%, avg=994.60, stdev=59.88, samples=948
iops : min= 10, max= 406, avg=248.45, stdev=14.97, samples=948
lat (msec) : 250=0.02%, 500=0.15%, 750=0.21%, 1000=0.21%, 2000=45.58%
lat (msec) : >=2000=53.82%
cpu : usr=0.13%, sys=0.38%, ctx=8379, majf=0, minf=189
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.6%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=29663,30000,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
READ: bw=986KiB/s (1010kB/s), 986KiB/s-986KiB/s (1010kB/s-1010kB/s), io=116MiB (121MB), run=120348-120348msec
WRITE: bw=997KiB/s (1021kB/s), 997KiB/s-997KiB/s (1021kB/s-1021kB/s), io=117MiB (123MB), run=120348-120348msec
Disk stats (read/write):
mmcblk0: ios=29806/30038, merge=0/25, ticks=3393864/11413451, in_queue=14807315, util=99.95%
I did a couple more runs and the tendency continued (all runs can be found here: https://paste.cachyos.org/p/0c5befb). BFQ on average was losing by ~5-10% in bandwidth. At the same time Kyber was generally keeping up with mq-deadline.
Maybe it's just my setup issues, but I wouldn't really want to enable BFQ for SD cards by default.
Maybe it's just my setup issues, but I wouldn't really want to enable BFQ for SD cards by default.
Ok. I made some tests. And see no difference regarding caching behavior.
So I agree that mq-deadline is probably better, because, it is groping write requests, what can be beneficial to SD cards and similar USB flash storage's...
So. How about this?
# BFQ is recommended for slow storage such as rotational block devices.
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
# None is recommended for nvme drives
ACTION=="add|change", KERNEL=="nvme[0-9]*", ATTR{queue/scheduler}="none"
# Kyber is recommended for SATA SSDs
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="0", SUBSYSTEMS!="usb", ATTR{queue/scheduler}="kyber"
https://github.com/CachyOS/CachyOS-Settings/pull/52 has to fix this issue, but feel free to report about any regressions.
The problem.
File etc/udev/rules.d/60-ioschedulers.rules contains rule:
ACTION=="add|change", KERNEL=="sd[a-z]*|mmcblk[0-9]*", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="bfq"
That sets bfq scheduler for all USB SATA and (SD card?) flash memory storage, which is probably unoptimal for modern USB enclosures, external USB SSDs, and some modern USB Falsh Drives (that basically is a small SSD)What is happening.
For example, I have SSD inside USB-SATA external enclosure. With this UDEV rule active, when I call cryptsetup to open it, SSD became busy(flashing activity light) for more than a minute, for this time command hangs. While without this rule, crypsetup - executes instantly, and device is ready to use.
Possible fix.
Fast USB flash storage's use USB to SCSI protocol. So probably rule can be modified to ignore SCSI non rotational devices. Or some rule may additionally be applied to set non rotational SCSI devices to default scheduler.
Notes.
Do we even need this rule for SATA SSDs? I got the impression that this can benefit only slow storage's, like USB Flash Drives(dongles), SD Memory Cards, and HDDs.