axboe / fio

Flexible I/O Tester
GNU General Public License v2.0
5.23k stars 1.26k forks source link

fio run io_uring_cmd got permission denied #1612

Closed gaowayne closed 8 months ago

gaowayne commented 1 year ago

Please acknowledge the following before creating a ticket

Description of the bug: I run io_uring_cmd over my NVMe device, I got permission denied error log

Environment: [root@AE-AliInspur-Fedora-G4-R28-28-29 fio]# uname -a Linux AE-AliInspur-Fedora-G4-R28-28-29 6.3.8-100.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun 15 01:51:54 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

fio version: [root@AE-AliInspur-Fedora-G4-R28-28-29 fio]# uname -a Linux AE-AliInspur-Fedora-G4-R28-28-29 6.3.8-100.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun 15 01:51:54 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Reproduction steps

job file

# io_uring_cmd I/O engine for nvme-ns generic character device

[global]
filename=/dev/ng2n1
ioengine=io_uring_cmd
cmd_type=nvme
size=700G
iodepth=128
bs=4K
thread=1
stonewall=1

#[rand-write]
#rw=randwrite
#sqthread_poll=1

#[rand-rea]
#rw=randread

[write-opts]
rw=write
sqthread_poll=1
#sqthread_poll_cpu=0
#nonvectored=1
#registerfiles=1

below is my nvme list Device Generic NSID Usage Format Controllers


nvme0n1 ng0n1 1 800.17 GB / 800.17 GB 4 KiB + 64 B nvme0 nvme1n1 ng1n1 1 30.73 TB / 30.73 TB 4 KiB + 0 B nvme1 nvme2n1 ng2n1 1 15.36 TB / 15.36 TB 4 KiB + 0 B nvme2 nvme3n1 ng3n1 1 1.60 TB / 1.60 TB 4 KiB + 0 B nvme3 nvme4n1 ng4n1 1 800.17 GB / 800.17 GB 4 KiB + 64 B nvme4

below is error log

[root@AE-AliInspur-Fedora-G4-R28-28-29 fio]# ./fio uring-cmd-wayne.fio
write-opts: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring_cmd, iodepth=128
fio-3.35-102-g62f35
Starting 1 thread
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=0, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=4096, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=8192, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=12288, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=16384, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=20480, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=24576, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=28672, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=32768, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=36864, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=40960, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=45056, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=49152, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=53248, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=57344, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=61440, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=65536, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=69632, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=73728, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=77824, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=81920, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=86016, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=90112, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=94208, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=98304, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=102400, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=106496, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=110592, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=114688, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=118784, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=122880, buflen=4096
fio: io_u error on file /dev/ng2n1: Permission denied: write offset=126976, buflen=4096
ankit-sam commented 1 year ago

Hi @gaowayne strange issue, thanks for the configuration file and log. I tried with your config on my setup (6.0.0-rc2 kernel and latest fio with NVMe device 4K + 0) but didn't see it the issue.

If possible can you please try few things

  1. Try with another device, I see that you have one 1.6TB (ng3n1). Just to make sure that the issue happens consistently.
  2. Does this happen without sqthread_poll=1?
  3. Does this happen with another kernel? Say 6.0 or 6.1

I will also try to figure out on my end with 6.3+ kernel.

joshkan commented 1 year ago

@gaowayne : Does read work? I suspect LSM (SELinux?) policy needs to be changed to allow access.
Also, please take a look at permission bits (DAC) for /dev/ngXnY.

gaowayne commented 1 year ago

Hi @gaowayne strange issue, thanks for the configuration file and log. I tried with your config on my setup (6.0.0-rc2 kernel and latest fio with NVMe device 4K + 0) but didn't see it the issue.

If possible can you please try few things

  1. Try with another device, I see that you have one 1.6TB (ng3n1). Just to make sure that the issue happens consistently.
  2. Does this happen without sqthread_poll=1?
  3. Does this happen with another kernel? Say 6.0 or 6.1

I will also try to figure out on my end with 6.3+ kernel.

I tried #1 and #2 together, it still report permission error. I will try other kernel version I have too

gaowayne commented 1 year ago

@gaowayne : Does read work? I suspect LSM (SELinux?) policy needs to be changed to allow access. Also, please take a look at permission bits (DAC) for /dev/ngXnY.

I checked read is not working too.

gaowayne commented 1 year ago

@gaowayne : Does read work? I suspect LSM (SELinux?) policy needs to be changed to allow access. Also, please take a look at permission bits (DAC) for /dev/ngXnY.

I checked read is not working too.

DAC is below. looks like this is root cause.

[root@AE-AliInspur-Fedora-G4-R28-28-29 fio]# ls -ll /dev/nvme2n1
brw-rw----. 1 root disk 259, 4 Jul 31 07:47 /dev/nvme2n1
[root@AE-AliInspur-Fedora-G4-R28-28-29 fio]# ls -ll /dev/ng2n1
crw-------. 1 root root 238, 3 Jul 31 07:47 /dev/ng2n1
[root@AE-AliInspur-Fedora-G4-R28-28-29 fio]# 
gaowayne commented 1 year ago

let me disable SELinux below then have a try. thank you so much man.

[root@AE-AliInspur-Fedora-G4-R28-28-29 fio]# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33
gaowayne commented 1 year ago

dear expert. after I disable SELinux, it works very well now. but performance is not so promising as I expected.

my fio configure

# io_uring_cmd I/O engine for nvme-ns generic character device

[global]
filename=/dev/ng2n1
ioengine=io_uring_cmd
cmd_type=nvme
size=1000G
iodepth=128
bs=4K
thread=1
stonewall=1

#[rand-write]
#rw=randwrite
#sqthread_poll=1

#[rand-rea]
#rw=randread

[write-opts]
rw=write
sqthread_poll=1
sqthread_poll_cpu=0
nonvectored=1
registerfiles=1

below is the result, with io_uring cmd, I can only 2000MB/s with 4K seq write. anything we can speed up? SPDK can do 6000MB/s for 4K seq with 0.5 CPU core.

[root@AE-AliInspur-Fedora-G4-R28-28-29 fio]# ./fio uring-cmd-wayne.fio
write-opts: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring_cmd, iodepth=128
fio-3.35-102-g62f35
Starting 1 thread
^Cbs: 1 (f=1): [W(1)][58.1%][w=1916MiB/s][w=491k IOPS][eta 03m:50s]
fio: terminating on signal 2

write-opts: (groupid=0, jobs=1): err= 0: pid=7701: Tue Aug 15 19:41:21 2023
  write: IOPS=477k, BW=1865MiB/s (1956MB/s)(582GiB/319358msec); 0 zone resets
    clat (usec): min=5, max=4253, avg=267.68, stdev=272.26
     lat (usec): min=7, max=4253, avg=267.85, stdev=272.26
    clat percentiles (usec):
     |  1.00th=[   13],  5.00th=[   19], 10.00th=[   25], 20.00th=[   37],
     | 30.00th=[   52], 40.00th=[   80], 50.00th=[  194], 60.00th=[  302],
     | 70.00th=[  363], 80.00th=[  457], 90.00th=[  644], 95.00th=[  824],
     | 99.00th=[ 1139], 99.50th=[ 1270], 99.90th=[ 1500], 99.95th=[ 1582],
     | 99.99th=[ 2008]
   bw (  MiB/s): min= 1676, max= 2069, per=100.00%, avg=1865.14, stdev=77.60, samples=638
   iops        : min=429260, max=529844, avg=477476.84, stdev=19866.82, samples=638
  lat (usec)   : 10=0.02%, 20=6.36%, 50=22.65%, 100=14.92%, 250=9.61%
  lat (usec)   : 500=29.46%, 750=10.31%, 1000=4.48%
  lat (msec)   : 2=2.16%, 4=0.01%, 10=0.01%
  cpu          : usr=99.83%, sys=0.00%, ctx=1035, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,152478035,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
  WRITE: bw=1865MiB/s (1956MB/s), 1865MiB/s-1865MiB/s (1956MB/s-1956MB/s), io=582GiB (625GB), run=319358-319358msec
ankit-sam commented 1 year ago

@gaowayne can you try with fio option hipri instead of sqthread_poll and sqthread_poll_cpu. But for this you will have to enable polling queues in nvme driver. Something like this

cat /sys/block/nvme0n1/queue/io_poll
0

rmmod nvme-core
rmmod nvme
insmod drivers/nvme/host/nvme-core.ko
insmod drivers/nvme/host/nvme.ko poll_queues=1

cat /sys/block/nvme0n1/queue/io_poll
1

btw is the 6000 MB/s with the spdk ioengine or their perf utility? fio also has t/io_uring for peak performance, you will have to modify it a bit for write workload as it only for read workload.

gaowayne commented 1 year ago

@gaowayne can you try with fio option hipri instead of sqthread_poll and sqthread_poll_cpu. But for this you will have to enable polling queues in nvme driver. Something like this

cat /sys/block/nvme0n1/queue/io_poll
0

rmmod nvme-core
rmmod nvme
insmod drivers/nvme/host/nvme-core.ko
insmod drivers/nvme/host/nvme.ko poll_queues=1

cat /sys/block/nvme0n1/queue/io_poll
1

btw is the 6000 MB/s with the spdk ioengine or their perf utility? fio also has t/io_uring for peak performance, you will have to modify it a bit for write workload as it only for read workload.

thank you so much man. let me try. SPDK test is using fio+SPDK plugin. :)

gaowayne commented 1 year ago

with this new configure. I will test.

[global]
filename=/dev/ng2n1
ioengine=io_uring_cmd
cmd_type=nvme
size=1000G
iodepth=128
bs=4K
thread=1
stonewall=1

#[rand-write]
#rw=randwrite
#sqthread_poll=1

#[rand-rea]
#rw=randread

[write-opts]
rw=read
#use polling in nvme drive
hipri=1
#sqthread_poll=1
#sqthread_poll_cpu=0
nonvectored=1
registerfiles=1
gaowayne commented 1 year ago

there is one problem for nvme driver replacement. if one nvme drive has mount point, I cannot use SPDK driver to replace nvme driver. when I try to remove nvme-core, it give me below error. I will find another clean machine to test. but looks like managment is not very flexiable.

[root@AE-AliInspur-Fedora-G4-R28-28-29 fio]# rmmod nvme-core
rmmod: ERROR: Module nvme_core is in use by: nvme
[root@AE-AliInspur-Fedora-G4-R28-28-29 fio]# rmmod nvme
rmmod: ERROR: Module nvme is in use
gaowayne commented 1 year ago

great news! with your suggested configure, we use polling in nvme driver. we can got read BW reach 4GB/s. almost two times better than before. let me try write.

[root@zns-s1 tmp]# fio uring_cmd.fio
write-opts: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring_cmd, iodepth=128
fio-3.35-115-g6795
Starting 1 thread
Jobs: 1 (f=1): [R(1)][3.9%][r=3972MiB/s][r=1017k IOPS][eta 04m:32s]
Message from syslogd@zns-s1 at Aug 16 01:41:16 ...
 kernel:Disabling IRQ #878
Jobs: 1 (f=1): [R(1)][100.0%][r=3933MiB/s][r=1007k IOPS][eta 00m:00s]
write-opts: (groupid=0, jobs=1): err= 0: pid=302162: Wed Aug 16 01:45:23 2023
  read: IOPS=1012k, BW=3954MiB/s (4146MB/s)(1000GiB/258987msec)
    slat (nsec): min=655, max=57675, avg=768.13, stdev=885.13
    clat (usec): min=7, max=783, avg=125.55, stdev= 6.55
     lat (usec): min=9, max=784, avg=126.32, stdev= 6.61
    clat percentiles (usec):
     |  1.00th=[  123],  5.00th=[  123], 10.00th=[  123], 20.00th=[  123],
     | 30.00th=[  123], 40.00th=[  124], 50.00th=[  124], 60.00th=[  124],
     | 70.00th=[  125], 80.00th=[  133], 90.00th=[  133], 95.00th=[  133],
     | 99.00th=[  135], 99.50th=[  188], 99.90th=[  192], 99.95th=[  196],
     | 99.99th=[  206]
   bw (  MiB/s): min= 2599, max= 3975, per=100.00%, avg=3955.49, stdev=133.36, samples=517
   iops        : min=665384, max=1017662, avg=1012605.30, stdev=34139.50, samples=517
  lat (usec)   : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=100.00%
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  cpu          : usr=30.20%, sys=69.67%, ctx=354, majf=0, minf=1227
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=262144000,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=3954MiB/s (4146MB/s), 3954MiB/s-3954MiB/s (4146MB/s-4146MB/s), io=1000GiB (1074GB), run=258987-258987msec
gaowayne commented 1 year ago

for write, I use one SLC SSD, it can offer 4000MB/s seq 4K write. I run regular fio can get this. but with io_uring_cmd. maybe because I saw it only use one CPU core. it reach only 3000MB/s

[root@zns-s1 tmp]# fio uring_cmd.fio
write-opts: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring_cmd, iodepth=128
fio-3.35-115-g6795
Starting 1 thread
Jobs: 1 (f=1): [W(1)][0.7%][w=2859MiB/s][w=732k IOPS][eta 06m:46s]
Message from syslogd@zns-s1 at Aug 16 01:52:54 ...
 kernel:Disabling IRQ #1457
Jobs: 1 (f=1): [W(1)][74.6%][w=3070MiB/s][w=786k IOPS][eta 01m:25s]
write-opts: (groupid=0, jobs=1): err= 0: pid=302563: Wed Aug 16 01:57:01 2023
  write: IOPS=784k, BW=3061MiB/s (3210MB/s)(745GiB/249273msec); 0 zone resets
    slat (nsec): min=661, max=92347, avg=840.59, stdev=480.19
    clat (usec): min=3, max=1948, avg=162.34, stdev=63.45
     lat (usec): min=5, max=1949, avg=163.18, stdev=63.45
    clat percentiles (usec):
     |  1.00th=[   32],  5.00th=[  101], 10.00th=[  125], 20.00th=[  129],
     | 30.00th=[  133], 40.00th=[  137], 50.00th=[  147], 60.00th=[  165],
     | 70.00th=[  182], 80.00th=[  192], 90.00th=[  210], 95.00th=[  237],
     | 99.00th=[  445], 99.50th=[  529], 99.90th=[  701], 99.95th=[  766],
     | 99.99th=[  906]
   bw (  MiB/s): min= 2424, max= 3163, per=100.00%, avg=3062.48, stdev=67.10, samples=498
   iops        : min=620774, max=809838, avg=783993.86, stdev=17176.69, samples=498
  lat (usec)   : 4=0.01%, 10=0.01%, 20=0.15%, 50=1.95%, 100=2.76%
  lat (usec)   : 250=91.01%, 500=3.47%, 750=0.59%, 1000=0.06%
  lat (msec)   : 2=0.01%
  cpu          : usr=25.26%, sys=74.59%, ctx=531, majf=0, minf=664
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,195353046,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
  WRITE: bw=3061MiB/s (3210MB/s), 3061MiB/s-3061MiB/s (3210MB/s-3210MB/s), io=745GiB (800GB), run=249273-249273msec

for read, we still have 2GB/s room to improve. for write we still have 1GB/s room. could you please shed some light? 👍

joshkan commented 1 year ago

You can try enabling "fixedbufs" in fio config. Also try these: echo 0 > /sys/block/nvmeXnY/queue/iostats echo 2 > /sys/block/nvmeXnY/queue/nomerges

Although I doubt whether this will be able to fill the kind of gap you see. What is the CPU you are using?

gaowayne commented 1 year ago

cat /sys/block/nvme0n1/queue/io_poll

thank you so much. man, I tried these option and fixedbufs. it did not change any result however

[root@zns-s1 wayne]# fio uring_cmd.fio
write-opts: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring_cmd, iodepth=128
fio-3.35-115-g6795
Starting 1 thread
Jobs: 1 (f=1): [W(1)][0.7%][w=2740MiB/s][w=701k IOPS][eta 06m:48s]
Message from syslogd@zns-s1 at Aug 31 01:59:00 ...                                                                                                                                                                                                                                  "zns-s1" 01:59 31-Aug-23
 kernel:Disabling IRQ #1284
Jobs: 1 (f=1): [W(1)][74.6%][w=3062MiB/s][w=784k IOPS][eta 01m:25s]
write-opts: (groupid=0, jobs=1): err= 0: pid=638316: Thu Aug 31 02:03:07 2023
  write: IOPS=783k, BW=3059MiB/s (3208MB/s)(745GiB/249438msec); 0 zone resets
    slat (nsec): min=616, max=215252, avg=807.45, stdev=499.06
    clat (usec): min=3, max=1277, avg=162.49, stdev=64.32
     lat (usec): min=5, max=1278, avg=163.30, stdev=64.31
    clat percentiles (usec):
     |  1.00th=[   32],  5.00th=[  106], 10.00th=[  121], 20.00th=[  124],
     | 30.00th=[  127], 40.00th=[  133], 50.00th=[  149], 60.00th=[  174],
     | 70.00th=[  188], 80.00th=[  194], 90.00th=[  208], 95.00th=[  235],
     | 99.00th=[  445], 99.50th=[  529], 99.90th=[  701], 99.95th=[  758],
     | 99.99th=[  889]
   bw (  MiB/s): min= 2486, max= 3159, per=100.00%, avg=3060.58, stdev=66.07, samples=498
   iops        : min=636450, max=808746, avg=783509.66, stdev=16914.06, samples=498
  lat (usec)   : 4=0.01%, 10=0.01%, 20=0.12%, 50=1.97%, 100=2.37%
  lat (usec)   : 250=91.43%, 500=3.46%, 750=0.59%, 1000=0.05%
  lat (msec)   : 2=0.01%
  cpu          : usr=24.68%, sys=75.16%, ctx=434, majf=0, minf=380
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,195353046,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
  WRITE: bw=3059MiB/s (3208MB/s), 3059MiB/s-3059MiB/s (3208MB/s-3208MB/s), io=745GiB (800GB), run=249438-249438msec
[root@zns-s1 wayne]# 

I also did not specify any CPU.

joshkan commented 1 year ago

I also did not specify any CPU.

I meant CPU information of your setup.

Also, what does this show: t/io_uring -b4096 -d128 -c32 -s32 -p1 -F1 -B1 -O0 -n1 -u1 -r4 /dev/ngXnY You can run this command from fio source directory.

ll123456 commented 8 months ago

I run io_uring_cmd over my NVMe device, I got Operation not permitted

[root@RHEL9 fio]# uname -a Linux RHEL9 5.14.0-362.18.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Jan 3 15:54:45 EST 2024 x86_64 x86_64 x86_64 GNU/Linux

[root@RHEL9 fio]# sudo fio --version fio-3.36-38-g06c4

job file

examples/uring-cmd-pi-ext.fio [global] filename=/dev/ng0n1 ioengine=io_uring_cmd cmd_type=nvme size=1G iodepth=32 bs=4096 md_per_io_size=8 pi_act=0 pi_chk=GUARD,APPTAG,REFTAG apptag=0x0888 apptag_mask=0xFFFF thread=1 stonewall=1

[write] rw=write

[read] rw=read

below is my nvme list

Device Generic NSID Usage Format Controllers nvme0n1 ng0n1 1 3.20TB / 3.20TB 4 KiB + 8 B nvme0

below is error log

[root@RHEL9 fio]# ./fio ./examples/uring-cmd-pi-ext.fio write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring_cmd, iodepth=32 read: (g=1): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring_cmd, iodepth=32 fio-3.36-38-g06c4 Starting 2 threads fio: pid=2839, err=1/file:engines/io_uring.c:1092, func=io_queue_init, error=Operation not permitted fio: pid=2840, err=1/file:engines/io_uring.c:1092, func=io_queue_init, error=Operation not permitted

Run status group 0 (all jobs):

Run status group 1 (all jobs):

[root@RHEL9 ~]# sestatus SELinux status: disabled

I disable SELinux but fio still reports an error @ankit-sam , Please help take a look,thank you

ankit-sam commented 8 months ago

Hi @ll123456 after disabling SELinux, can you check the permission bits (DAC) for /dev/ng0n1 Also can you please check once if this issue happens without metadata and PI.

axboe commented 8 months ago

Also check if RHEL ships with io_uring set to a specific group:

axboe@m2max ~> grep . /proc/sys/kernel/io_uring*
/proc/sys/kernel/io_uring_disabled:0
/proc/sys/kernel/io_uring_group:-1
rreive commented 8 months ago

I would suggest FIO check the state of io_uring before running a test and come back with at least an error message. Better yet suggesting a course of action that should be taken per the msg above and maybe even display man page link (if there is one).

axboe commented 8 months ago

@rreive why don't you send a patch for that?

ll123456 commented 8 months ago

Hi @ll123456 after disabling SELinux, can you check the permission bits (DAC) for /dev/ng0n1 Also can you please check once if this issue happens without metadata and PI.

@ankit-sam 1.[root@RHEL9 ~]# ls -ll /dev/ng0n1 crw------- 1 root root 239, 0 Feb 18 16:31 /dev/ng0n1

fio is still report error after chmod 777 for /dev/ng0n1 [root@RHEL9 fio]# ls -ll /dev/ng0n1 crwxrwxrwx 1 root root 239, 0 Feb 18 16:31 /dev/ng0n1

  1. this issue still happens without metadata and PI

Device Generic NSID Usage Format Controllers nvme0n1 ng0n1 1 3.20TB / 3.20TB 4 KiB nvme0

[global] filename=/dev/ng0n1 ioengine=io_uring_cmd cmd_type=nvme size=1G iodepth=32 bs=4K thread=1 stonewall=1

[rand-write] rw=randwrite sqthread_poll=1

[rand-read] rw=randread

[write-opts] rw=write sqthread_poll=1 sqthread_poll_cpu=0 nonvectored=1 registerfiles=1

[root@RHEL9 fio]# ./fio ./examples/uring-cmd-ng.fio rand-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring_cmd, iodepth=32 rand-read: (g=1): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring_cmd, iodepth=32 write-opts: (g=2): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring_cmd, iodepth=32 fio-3.36-38-g06c4 Starting 3 threads fio: pid=3037, err=1/file:engines/io_uring.c:1092, func=io_queue_init, error=Operation not permitted fio: pid=3038, err=1/file:engines/io_uring.c:1092, func=io_queue_init, error=Operation not permitted fio: pid=3039, err=1/file:engines/io_uring.c:1092, func=io_queue_init, error=Operation not permitted

Run status group 0 (all jobs):

Run status group 1 (all jobs):

Run status group 2 (all jobs):

ll123456 commented 8 months ago

Also check if RHEL ships with io_uring set to a specific group:

axboe@m2max ~> grep . /proc/sys/kernel/io_uring*
/proc/sys/kernel/io_uring_disabled:0
/proc/sys/kernel/io_uring_group:-1

@axboe [root@RHEL9 fio]# grep . /proc/sys/kernel/io_uring 2 [root@RHEL9 fio]# ls /proc/sys/kernel/io_uring /proc/sys/kernel/io_uring_disabled

ll123456 commented 8 months ago

find root cause io_uring_disabled is set to 2 [root@RHEL9 fio]# sysctl -a | grep io_uring kernel.io_uring_disabled = 2

屏幕截图 2024-02-18 174613

run fio normal after set io_uring_disabled to 0 [root@RHEL9 fio]# sysctl -w kernel.io_uring_disabled=0 kernel.io_uring_disabled = 0

[root@RHEL9 fio]# ./fio ./examples/uring-cmd-pi-ext.fio write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring_cmd, iodepth=32 read: (g=1): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring_cmd, iodepth=32 fio-3.36-38-g06c4 Starting 2 threads Jobs: 1 (f=1) write: (groupid=0, jobs=1): err= 0: pid=3412: Sun Feb 18 17:34:44 2024 write: IOPS=747k, BW=2917MiB/s (3059MB/s)(1024MiB/351msec); 0 zone resets slat (nsec): min=558, max=18944, avg=947.31, stdev=372.57 clat (usec): min=8, max=434, avg=41.64, stdev=12.54 lat (usec): min=8, max=435, avg=42.59, stdev=12.53 clat percentiles (usec): | 1.00th=[ 21], 5.00th=[ 31], 10.00th=[ 36], 20.00th=[ 38], | 30.00th=[ 38], 40.00th=[ 39], 50.00th=[ 39], 60.00th=[ 40], | 70.00th=[ 42], 80.00th=[ 46], 90.00th=[ 52], 95.00th=[ 66], | 99.00th=[ 78], 99.50th=[ 94], 99.90th=[ 184], 99.95th=[ 219], | 99.99th=[ 379] lat (usec) : 10=0.01%, 20=0.92%, 50=86.79%, 100=11.83%, 250=0.43% lat (usec) : 500=0.02% cpu : usr=24.29%, sys=56.00%, ctx=4414, majf=0, minf=0 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwts: total=0,262144,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32 read: (groupid=1, jobs=1): err= 0: pid=3413: Sun Feb 18 17:34:44 2024 read: IOPS=361k, BW=1410MiB/s (1479MB/s)(1024MiB/726msec) slat (nsec): min=549, max=14421, avg=740.41, stdev=302.17 clat (usec): min=5, max=891, avg=87.69, stdev=39.33 lat (usec): min=6, max=892, avg=88.43, stdev=39.32 clat percentiles (usec): | 1.00th=[ 14], 5.00th=[ 28], 10.00th=[ 38], 20.00th=[ 56], | 30.00th=[ 68], 40.00th=[ 78], 50.00th=[ 86], 60.00th=[ 96], | 70.00th=[ 108], 80.00th=[ 121], 90.00th=[ 137], 95.00th=[ 149], | 99.00th=[ 174], 99.50th=[ 186], 99.90th=[ 251], 99.95th=[ 363], | 99.99th=[ 840] bw ( MiB/s): min= 1428, max= 1428, per=100.00%, avg=1428.98, stdev= 0.00, samples=1 iops : min=365820, max=365820, avg=365820.00, stdev= 0.00, samples=1 lat (usec) : 10=0.46%, 20=1.97%, 50=14.44%, 100=46.99%, 250=36.04% lat (usec) : 500=0.07%, 750=0.01%, 1000=0.02% cpu : usr=11.72%, sys=29.38%, ctx=146150, majf=0, minf=0 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwts: total=262144,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs): WRITE: bw=2917MiB/s (3059MB/s), 2917MiB/s-2917MiB/s (3059MB/s-3059MB/s), io=1024MiB (1074MB), run=351-351msec

Run status group 1 (all jobs): READ: bw=1410MiB/s (1479MB/s), 1410MiB/s-1410MiB/s (1479MB/s-1479MB/s), io=1024MiB (1074MB), run=726-726msec

Thank you, @ankit-sam and @axboe