fio doesn't seem to keep up with timestamps while replaying

lokeshjaliminche commented 1 year ago

Description of the bug: I am trying to replay a trace file with fio. However, fio doesn't seem to keep up with the timestamps and runs very slowly. Any ideas on how to achieve this?

I am attaching job file and trace file below

Environment: Ubuntu Linux 5.4.0-139-generic

fio version: fio --version fio-3.33-109-gb5904

Reproduction steps [global] direct=1

[job_1] numjobs=1 ioengine=libaio replay_skip=write read_iolog=./replay_2.log

Tracefile: replay_2.log.tar.gz

horshack-dpreview commented 1 year ago

@lokeshjaliminche, I was able to reproduce your replay delays. By default, fio reproduces the timestamp between I/Os when replaying and from checking your log I see some very large delays between some I/Os. Like this (1st column is #usec from start):

20 /dev/ram0 write 488197214720 4096
21 /dev/ram0 read 482637861376 24576
22 /dev/ram0 read 494470889984 12288
23 /dev/ram0 read 487460954624 65536
1000001 /dev/ram0 read 489065263616 53248
1000002 /dev/ram0 read 489065320960 24576
1000003 /dev/ram0 write 70974357504 4096
1000004 /dev/ram0 write 483084784128 8192
1000005 /dev/ram0 read 489065345536 32768
1000006 /dev/ram0 read 489065452032 16384
1000007 /dev/ram0 read 489065382400 65536
1 /dev/ram0 read 497433108992 4096
1000001 /dev/ram0 write 480025637376 4096
1 /dev/ram0 read 481019101696 4096
1000001 /dev/ram0 write 480244576768 4096
1000002 /dev/ram0 write 482193912320 4096
1000003 /dev/ram0 write 481894437376 4096
1000004 /dev/ram0 write 480241426944 8192
1000005 /dev/ram0 write 480241496576 4096
1000006 /dev/ram0 write 482194211328 4096
1000007 /dev/ram0 write 493699576320 4096
1000008 /dev/ram0 write 48205594624 4096
1000009 /dev/ram0 write 491778695680 4096
1000010 /dev/ram0 write 480180539904 4096
1 /dev/ram0 read 21969521664 37888

This looks like multiple intermixed workloads and/or multiple invocations for the same log. Notice how the timestamp from start resets back to 1 and also vacillates between low vales and 1,000,000+us. Each of those low -> 1,000,000+us jumps will incur a 1+ second delay on playback.

Can you post the original workload config that generated this log?

Also, if you don't need to simulate the delays between I/Os you can disable this behavior via replay_no_stall=1

https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-replay-no-stall

lokeshjaliminche commented 1 year ago

Hi @horshack-dpreview

Thanks for looking into it. Yes you are right, the log contains IOs coming from multiple applications. I don't have workload config as they are traces from from SNIA . I preprocessed logs so that they can be replayed with fio.

My assumption was that the timestamp is the submission time. Are there any tunables to ensure that fio submit the IOs according to timestamp without waiting for completion of the IO?

To avoid long delays I sorted the logs according to timestamp and replayed the logs. I am expecting it to finish within 11 sec (which is the max timestamp: 10055641). However the replay takes more than 10 seconds ( run=16642-16642msec)

  read: IOPS=15.9k, BW=489MiB/s (513MB/s)(8146MiB/16642msec)
    slat (nsec): min=1616, max=2889.2k, avg=10354.68, stdev=15975.16
    clat (usec): min=1122, max=1000.7k, avg=30966.64, stdev=108771.08
     lat (usec): min=1126, max=1000.7k, avg=30977.00, stdev=108771.01
    clat percentiles (msec):
     |  1.00th=[   13],  5.00th=[   14], 10.00th=[   15], 20.00th=[   16],
     | 30.00th=[   17], 40.00th=[   18], 50.00th=[   18], 60.00th=[   19],
     | 70.00th=[   20], 80.00th=[   22], 90.00th=[   24], 95.00th=[   26],
     | 99.00th=[  953], 99.50th=[  961], 99.90th=[ 1003], 99.95th=[ 1003],
     | 99.99th=[ 1003]
   bw (  KiB/s): min=105205, max=931863, per=100.00%, avg=632351.54, stdev=273812.87, samples=26
   iops        : min= 3226, max=33164, avg=20073.85, stdev=8815.57, samples=26
  write: IOPS=17.8k, BW=691MiB/s (725MB/s)(11.2GiB/16642msec); 0 zone resets
    slat (nsec): min=1819, max=2902.2k, avg=21997.69, stdev=53590.49
    clat (usec): min=1149, max=1000.7k, avg=29762.96, stdev=104224.31
     lat (usec): min=1155, max=1000.7k, avg=29784.97, stdev=104224.95
    clat percentiles (msec):
     |  1.00th=[   13],  5.00th=[   14], 10.00th=[   15], 20.00th=[   16],
     | 30.00th=[   17], 40.00th=[   17], 50.00th=[   18], 60.00th=[   19],
     | 70.00th=[   20], 80.00th=[   21], 90.00th=[   23], 95.00th=[   26],
     | 99.00th=[  944], 99.50th=[  961], 99.90th=[  995], 99.95th=[ 1003],
     | 99.99th=[ 1003]
   bw (  KiB/s): min=146554, max=1285426, per=100.00%, avg=893140.19, stdev=362812.24, samples=26
   iops        : min= 3580, max=39396, avg=22521.23, stdev=10488.69, samples=26
  lat (msec)   : 2=0.01%, 4=0.02%, 10=0.06%, 20=71.96%, 50=26.64%
  lat (msec)   : 1000=1.26%, 2000=0.02%
  cpu          : usr=9.73%, sys=50.82%, ctx=35, majf=0, minf=28
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=264477,296615,0,0 short=0,59,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1024

Run status group 0 (all jobs):
   READ: bw=489MiB/s (513MB/s), 489MiB/s-489MiB/s (513MB/s-513MB/s), io=8146MiB (8542MB), run=16642-16642msec
  WRITE: bw=691MiB/s (725MB/s), 691MiB/s-691MiB/s (725MB/s-725MB/s), io=11.2GiB (12.1GB), run=16642-16642msec

Replayed logs sorted according to timestamp replay_2.tar.gz

horshack-dpreview commented 1 year ago

@lokeshjaliminche, Your playback workload is using a QD-1 and one I/O thread (numjobs=1). The description of the original workload implies multi-threaded (across applications) and an aggregate QD > 1. That might explain the 60% drop for the playback.

lokeshjaliminche commented 1 year ago

@horshack-dpreview Initially, I ran it with QD-1 however in the recent experiment I increased the QD to 1024. we can see it in the results I provided above IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%.

It still didn't help. Any suggestions on what other parameters I could try? I want to make sure that the IO traces are replayed according to timestamps. Any parameter that makes sure that the IOs are submitted without worrying about completion time?

horshack-dpreview commented 1 year ago

@lokeshjaliminche, Got it thanks. Looking at the I/O log output from your experiment I see 4 one-second gaps. Since I don't know your experiment's workload I can't tell if these are actual I/O latencies or side effects of the config workload. Depending on which they might be adding four seconds to your playback:

3051182 /dev/ram0 read 492917563904 12288
3051183 /dev/ram0 write 483071676928 4096
3051184 /dev/ram0 read 31971475456 4096
4000001 /dev/ram0 read 493476332032 4096
4000001 /dev/ram0 write 482805404160 4096
4000001 /dev/ram0 read 485117051392 131072

4067407 /dev/ram0 read 499548406272 65536
4067408 /dev/ram0 read 487835439616 4096
4067409 /dev/ram0 read 487835443712 4096
5000001 /dev/ram0 read 38741913600 65536
5000001 /dev/ram0 read 27124576256 57344
5000001 /dev/ram0 read 482082193920 65536

5061530 /dev/ram0 write 55286734848 4096
5061531 /dev/ram0 write 39448035328 4096
5061532 /dev/ram0 write 59729498112 12288
6000001 /dev/ram0 read 486865990144 32768
6000001 /dev/ram0 read 485061517824 4096
6000001 /dev/ram0 read 158657871872 16384

8054438 /dev/ram0 read 480078978560 12800
8054439 /dev/ram0 write 491825119744 57344
8054440 /dev/ram0 write 36098703360 8192
9000001 /dev/ram0 read 492664594944 32768
9000001 /dev/ram0 read 482726396416 32768
9000001 /dev/ram0 write 480160523776 3072

lokeshjaliminche commented 1 year ago

@horshack-dpreview ok, just to confirm my understanding about how timestamps are used by fio to replay logs Does fio consider timestamp as IO submission time? for instance, in the following log fio submits a write request at 3051184 us from the start time of the job and submits a read request at 4000001us?

3051184 /dev/ram0 write 31971475456 4096
4000001 /dev/ram0 read 493476332032 4096

horshack-dpreview commented 1 year ago

@lokeshjaliminche, That's correct, although delta times from the previous entry are calculated vs using a running absolute time. The I/O is logged at submission (original workload) and the delay on replay is inserted in the submission path.

Here's the logic that process I/O log to calculate delta between entries. This is stored in a list processed later when the replay is performed: https://github.com/axboe/fio/blob/051b5785bc47ab216fa3db9dceb6184073dcc88a/iolog.c#L433-L474

Logic that inserts delay in submission path: https://github.com/axboe/fio/blob/051b5785bc47ab216fa3db9dceb6184073dcc88a/iolog.c#L81-L114

lokeshjaliminche commented 1 year ago

@horshack-dpreview Thank you for the pointers. The logic looks fine. However, it's still not clear why it extends the actual replay time.

horshack-dpreview commented 1 year ago

@lokeshjaliminche, I was able to reproduce the replay execution time disparity using my own workload and without the multiple one-second gaps I saw in your logs. I have an idea what's happening but it'll need a little more debugging to fully root cause.

lokeshjaliminche commented 1 year ago

@horshack-dpreview that's great! I hope it will get fixed soon.

lokeshjaliminche commented 1 year ago

@horshack-dpreview I added debug traces, and it shows that the delays between submissions go high. Probably fio waits for IO to complete before submitting the next IO from the logs.

 static void iolog_delay(struct thread_data *td, unsigned long delay) 
 { 
    uint64_t usec = utime_since_now(&td->last_issue);

debug traces

iolog: get 482397446656/4096//dev/nvme0n1p1
iolog: get 481268355584/73728//dev/nvme0n1p1
iolog: get 481185620480/4096//dev/nvme0n1p1
delay : 40, time_offset : 0, usec(time_since_last_issue) : 3
iolog: get 37514895360/8192//dev/nvme0n1p1
delay : 40, time_offset : 0, usec(time_since_last_issue) : 5417
delay : 40, time_offset : 0, usec(time_since_last_issue) : 3498
iolog: get 46569434624/4096//dev/nvme0n1p1
delay : 40, time_offset : 0, usec(time_since_last_issue) : 5
delay : 40, time_offset : 0, usec(time_since_last_issue) : 57

horshack-dpreview commented 1 year ago

@lokeshjaliminche, That's the routine which simulates the submission delay based on the timestamp deltas from the logs. I root caused it earlier today and have a fix running that works. Just mulling over the best way to implement the final fix.

What's happening is this: the logic is calculating the time delta from the last I/O submission during the replay and if that's faster than the time delta from the log it inserts a usec_sleep() to keep the pacing the same as the I/Os in the log. The problem is it calculates the replay time starting from just after the kernel submission in td_io_queue() to the next invocation of iolog_delay(). This excludes the code execution time from the return of td_io_queue() through and including the kernel submission in td_io_queue(). It's the kernel submission that takes the most time. As a result, the usec calculation at the top of iolog_delay() is lower than it should be, which means it inserts a larger delay than needed to track to the log timestamps. The fix is to calculate the full path time including submission. Fix is easy but I'm being careful to make sure there's no side effects from the change.

lokeshjaliminche commented 1 year ago

@horshack-dpreview I see that makes sense. Thanks for looking into it.

horshack-dpreview commented 1 year ago

@lokeshjaliminche, I've committed the changes to my fio fork. Can you please compile and verify the changes fix the issue on your config. Here's the link to the branch:

https://github.com/horshack-dpreview/fio/tree/Fix_Iolog_Replay_Timing

lokeshjaliminche commented 1 year ago

@horshack-dpreview yes it's working with your changes.

lokeshjaliminche commented 1 year ago

@horshack-dpreview I am testing it with some other configs I will update once I am done

lokeshjaliminche commented 1 year ago

@horshack-dpreview I have tested your changes with multiple configs it's working as expected.

horshack-dpreview commented 1 year ago

@lokeshjaliminche, Great, thanks for performing those tests.

horshack-dpreview commented 1 year ago

For reference, here is the configuration to reproduce this issue using the null engine:

test.fio:

[global]
size=1GB
rw=randrw
iodepth=1
bs=2K
ioengine=null
thread

[generate]
write_iolog=/tmp/iolog.txt

[replay]
read_iolog=/tmp/iolog.txt

Command to generate and replay I/Os: sudo rm /tmp/iolog.txt; sudo fio /tmp/test.fio --section=generate --output=/tmp/write_output.txt; sudo fio /tmp/test.fio --section=replay --output=/tmp/read_output.txt

Output of command showing bug. Notice that generate takes 3715ms but replay takes 7035ms:

generate: (g=0): rw=randrw, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) 2048B-2048B, ioengine=null, iodepth=1
fio-3.33-96-gded6c
Starting 1 thread

generate: (groupid=0, jobs=1): err= 0: pid=151451: Wed Mar  8 19:39:59 2023
  read: IOPS=70.5k, BW=138MiB/s (144MB/s)(512MiB/3715msec)
    clat (nsec): min=488, max=15365, avg=1134.52, stdev=272.42
     lat (nsec): min=3422, max=21860, avg=4661.45, stdev=480.59
    clat percentiles (nsec):
     |  1.00th=[  908],  5.00th=[  908], 10.00th=[  908], 20.00th=[  908],
     | 30.00th=[  980], 40.00th=[  980], 50.00th=[  980], 60.00th=[  980],
     | 70.00th=[ 1400], 80.00th=[ 1400], 90.00th=[ 1464], 95.00th=[ 1464],
     | 99.00th=[ 1464], 99.50th=[ 1464], 99.90th=[ 1880], 99.95th=[ 1960],
     | 99.99th=[ 9024]
   bw (  KiB/s): min=135772, max=142816, per=100.00%, avg=141084.00, stdev=2416.47, samples=7
   iops        : min=67886, max=71408, avg=70542.00, stdev=1208.24, samples=7
  write: IOPS=70.6k, BW=138MiB/s (145MB/s)(512MiB/3715msec); 0 zone resets
    clat (nsec): min=488, max=12012, avg=1130.41, stdev=264.54
     lat (nsec): min=3422, max=28634, avg=4672.23, stdev=492.13
    clat percentiles (nsec):
     |  1.00th=[  908],  5.00th=[  908], 10.00th=[  908], 20.00th=[  908],
     | 30.00th=[  980], 40.00th=[  980], 50.00th=[  980], 60.00th=[  980],
     | 70.00th=[ 1400], 80.00th=[ 1400], 90.00th=[ 1464], 95.00th=[ 1464],
     | 99.00th=[ 1464], 99.50th=[ 1464], 99.90th=[ 1880], 99.95th=[ 1960],
     | 99.99th=[ 8768]
   bw (  KiB/s): min=136100, max=142948, per=99.98%, avg=141212.00, stdev=2421.87, samples=7
   iops        : min=68050, max=71474, avg=70606.00, stdev=1210.93, samples=7
  lat (nsec)   : 500=0.01%, 1000=62.90%
  lat (usec)   : 2=37.05%, 4=0.01%, 10=0.03%, 20=0.01%
  cpu          : usr=9.77%, sys=90.15%, ctx=6, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=261946,262342,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=138MiB/s (144MB/s), 138MiB/s-138MiB/s (144MB/s-144MB/s), io=512MiB (536MB), run=3715-3715msec
  WRITE: bw=138MiB/s (145MB/s), 138MiB/s-138MiB/s (145MB/s-145MB/s), io=512MiB (537MB), run=3715-3715msec

replay: (g=0): rw=randrw, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) 2048B-2048B, ioengine=null, iodepth=1
fio-3.33-96-gded6c
Starting 1 thread

replay: (groupid=0, jobs=1): err= 0: pid=151472: Wed Mar  8 19:40:07 2023
  read: IOPS=37.2k, BW=72.7MiB/s (76.3MB/s)(512MiB/7035msec)
    clat (nsec): min=488, max=11034, avg=1102.28, stdev=241.21
     lat (nsec): min=1466, max=13199, avg=2206.86, stdev=273.50
    clat percentiles (nsec):
     |  1.00th=[  908],  5.00th=[  908], 10.00th=[  908], 20.00th=[  908],
     | 30.00th=[  980], 40.00th=[  980], 50.00th=[  980], 60.00th=[  980],
     | 70.00th=[ 1400], 80.00th=[ 1400], 90.00th=[ 1464], 95.00th=[ 1464],
     | 99.00th=[ 1464], 99.50th=[ 1464], 99.90th=[ 1880], 99.95th=[ 2448],
     | 99.99th=[ 6304]
   bw (  KiB/s): min=71204, max=75180, per=100.00%, avg=74502.00, stdev=1020.40, samples=14
   iops        : min=35602, max=37590, avg=37251.00, stdev=510.20, samples=14
  write: IOPS=37.3k, BW=72.8MiB/s (76.4MB/s)(512MiB/7035msec); 0 zone resets
    clat (nsec): min=488, max=12920, avg=1099.15, stdev=239.58
     lat (nsec): min=1467, max=14387, avg=2216.61, stdev=269.88
    clat percentiles (nsec):
     |  1.00th=[  908],  5.00th=[  908], 10.00th=[  908], 20.00th=[  908],
     | 30.00th=[  980], 40.00th=[  980], 50.00th=[  980], 60.00th=[  980],
     | 70.00th=[ 1400], 80.00th=[ 1400], 90.00th=[ 1464], 95.00th=[ 1464],
     | 99.00th=[ 1464], 99.50th=[ 1464], 99.90th=[ 1880], 99.95th=[ 2800],
     | 99.99th=[ 6304]
   bw (  KiB/s): min=71312, max=75452, per=100.00%, avg=74615.43, stdev=1077.71, samples=14
   iops        : min=35656, max=37726, avg=37307.71, stdev=538.85, samples=14
  lat (nsec)   : 500=0.01%, 1000=69.41%
  lat (usec)   : 2=30.51%, 4=0.04%, 10=0.03%, 20=0.01%
  cpu          : usr=4.63%, sys=95.32%, ctx=12, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=261946,262342,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=72.7MiB/s (76.3MB/s), 72.7MiB/s-72.7MiB/s (76.3MB/s-76.3MB/s), io=512MiB (536MB), run=7035-7035msec
  WRITE: bw=72.8MiB/s (76.4MB/s), 72.8MiB/s-72.8MiB/s (76.4MB/s-76.4MB/s), io=512MiB (537MB), run=7035-7035msec

Here is the output with the fix:

generate: (g=0): rw=randrw, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) 2048B-2048B, ioengine=null, iodepth=1
fio-3.33
Starting 1 thread

generate: (groupid=0, jobs=1): err= 0: pid=151651: Wed Mar  8 19:44:58 2023
  read: IOPS=65.6k, BW=128MiB/s (134MB/s)(512MiB/3993msec)
    clat (nsec): min=489, max=13269, avg=1131.69, stdev=282.86
     lat (nsec): min=3701, max=30730, avg=4737.71, stdev=497.71
    clat percentiles (nsec):
     |  1.00th=[  908],  5.00th=[  908], 10.00th=[  908], 20.00th=[  908],
     | 30.00th=[  980], 40.00th=[  980], 50.00th=[  980], 60.00th=[  980],
     | 70.00th=[ 1400], 80.00th=[ 1400], 90.00th=[ 1464], 95.00th=[ 1464],
     | 99.00th=[ 1464], 99.50th=[ 1464], 99.90th=[ 1960], 99.95th=[ 3760],
     | 99.99th=[10048]
   bw (  KiB/s): min=129896, max=134080, per=100.00%, avg=131220.57, stdev=1598.72, samples=7
   iops        : min=64948, max=67040, avg=65610.29, stdev=799.36, samples=7
  write: IOPS=65.7k, BW=128MiB/s (135MB/s)(512MiB/3993msec); 0 zone resets
    clat (nsec): min=419, max=18647, avg=1132.05, stdev=277.79
     lat (nsec): min=3701, max=26260, avg=4740.21, stdev=499.63
    clat percentiles (nsec):
     |  1.00th=[  908],  5.00th=[  908], 10.00th=[  908], 20.00th=[  908],
     | 30.00th=[  980], 40.00th=[  980], 50.00th=[  980], 60.00th=[  980],
     | 70.00th=[ 1400], 80.00th=[ 1400], 90.00th=[ 1464], 95.00th=[ 1464],
     | 99.00th=[ 1464], 99.50th=[ 1464], 99.90th=[ 1960], 99.95th=[ 3760],
     | 99.99th=[ 9536]
   bw (  KiB/s): min=129452, max=133648, per=99.88%, avg=131238.86, stdev=1686.13, samples=7
   iops        : min=64726, max=66824, avg=65619.43, stdev=842.55, samples=7
  lat (nsec)   : 500=0.01%, 1000=63.43%
  lat (usec)   : 2=36.50%, 4=0.03%, 10=0.03%, 20=0.01%
  cpu          : usr=18.81%, sys=81.14%, ctx=10, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=261946,262342,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=128MiB/s (134MB/s), 128MiB/s-128MiB/s (134MB/s-134MB/s), io=512MiB (536MB), run=3993-3993msec
  WRITE: bw=128MiB/s (135MB/s), 128MiB/s-128MiB/s (135MB/s-135MB/s), io=512MiB (537MB), run=3993-3993msec

replay: (g=0): rw=randrw, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) 2048B-2048B, ioengine=null, iodepth=1
fio-3.33
Starting 1 thread

replay: (groupid=0, jobs=1): err= 0: pid=151672: Wed Mar  8 19:45:03 2023
  read: IOPS=60.5k, BW=118MiB/s (124MB/s)(512MiB/4333msec)
    clat (nsec): min=419, max=13339, avg=1159.71, stdev=282.30
     lat (nsec): min=1815, max=15644, avg=2327.10, stdev=318.75
    clat percentiles (nsec):
     |  1.00th=[  908],  5.00th=[  908], 10.00th=[  908], 20.00th=[  980],
     | 30.00th=[  980], 40.00th=[  980], 50.00th=[  980], 60.00th=[ 1400],
     | 70.00th=[ 1400], 80.00th=[ 1400], 90.00th=[ 1464], 95.00th=[ 1464],
     | 99.00th=[ 1464], 99.50th=[ 1880], 99.90th=[ 1960], 99.95th=[ 3760],
     | 99.99th=[ 9536]
   bw (  KiB/s): min=119292, max=123504, per=100.00%, avg=121171.00, stdev=1373.52, samples=8
   iops        : min=59646, max=61752, avg=60585.50, stdev=686.78, samples=8
  write: IOPS=60.5k, BW=118MiB/s (124MB/s)(512MiB/4333msec); 0 zone resets
    clat (nsec): min=488, max=11105, avg=1158.67, stdev=274.92
     lat (nsec): min=1815, max=16133, avg=2331.17, stdev=318.47
    clat percentiles (nsec):
     |  1.00th=[  908],  5.00th=[  908], 10.00th=[  908], 20.00th=[  980],
     | 30.00th=[  980], 40.00th=[  980], 50.00th=[  980], 60.00th=[ 1400],
     | 70.00th=[ 1400], 80.00th=[ 1400], 90.00th=[ 1464], 95.00th=[ 1464],
     | 99.00th=[ 1464], 99.50th=[ 1880], 99.90th=[ 1960], 99.95th=[ 3760],
     | 99.99th=[ 9152]
   bw (  KiB/s): min=119632, max=123924, per=100.00%, avg=121209.50, stdev=1719.94, samples=8
   iops        : min=59818, max=61962, avg=60604.75, stdev=859.92, samples=8
  lat (nsec)   : 500=0.03%, 1000=57.78%
  lat (usec)   : 2=42.11%, 4=0.05%, 10=0.03%, 20=0.01%
  cpu          : usr=8.98%, sys=91.09%, ctx=7, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=261946,262342,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=118MiB/s (124MB/s), 118MiB/s-118MiB/s (124MB/s-124MB/s), io=512MiB (536MB), run=4333-4333msec
  WRITE: bw=118MiB/s (124MB/s), 118MiB/s-118MiB/s (124MB/s-124MB/s), io=512MiB (537MB), run=4333-4333msec

axboe / fio

fio doesn't seem to keep up with timestamps while replaying #1537