axboe / fio

Flexible I/O Tester
GNU General Public License v2.0
5.01k stars 1.23k forks source link

specifying log_compression results in log_avg_msec ignored for bw and iops logs #1746

Open Rogach opened 3 months ago

Rogach commented 3 months ago

Please acknowledge the following before creating a ticket

Description of the bug: If log_compression is specified together with log_avg_msec, then bw and iops logs still contain a separate entry for each I/O done (instead of averages as expected).

Environment: Arch Linux, 6.7.5-arch1-1

fio version: fio-3.37

Reproduction steps

[global]
size=1TB

kb_base=1000
ioengine=libaio
invalidate=1
direct=1

write_bw_log
write_iops_log
write_lat_log
log_compression=128MiB
log_avg_msec=10

[test]
rw=read
io_limit=2GiB
iodepth=64

Run with fio --filename /dev/sdX test.fio.

Observe lots of non-averaged entries in the bw and iops logs:

9, 2803, 0, 0, 0
9, 3107, 0, 0, 0
9, 3103, 0, 0, 0
9, 3102, 0, 0, 0
9, 3102, 0, 0, 0
9, 3101, 0, 0, 0
9, 3101, 0, 0, 0
9, 3101, 0, 0, 0
9, 3100, 0, 0, 0
9, 3100, 0, 0, 0
vincentkfu commented 2 months ago

Do you know a fio version where this was working? I tried 3.36 and see the same issue.

root@localhost:~/fio-dev/1746/fio-canonical# ./fio-3.36 --name=test --ioengine=null --filesize=1T --write_bw_log=test --log_avg_msec=1000 --time_based --runtime=3s --log_compression=128M
test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=null, iodepth=1
fio-3.36
Starting 1 process
Jobs: 1 (f=1): [R(1)][-.-%][r=14.8GiB/s][r=3868k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=16213: Mon Apr  8 14:13:48 2024
  read: IOPS=3857k, BW=14.7GiB/s (15.8GB/s)(44.1GiB/3000msec)
    clat (nsec): min=30, max=16490, avg=33.72, stdev=39.94
     lat (nsec): min=60, max=16640, avg=64.57, stdev=55.76
    clat percentiles (nsec):
     |  1.00th=[   32],  5.00th=[   32], 10.00th=[   32], 20.00th=[   33],
     | 30.00th=[   33], 40.00th=[   33], 50.00th=[   33], 60.00th=[   34],
     | 70.00th=[   34], 80.00th=[   34], 90.00th=[   35], 95.00th=[   35],
     | 99.00th=[   36], 99.50th=[   37], 99.90th=[  125], 99.95th=[  131],
     | 99.99th=[  137]
   bw (  MiB/s): min=  242, max=133333, per=100.00%, avg=119630.13, stdev=4764.08, samples=11570360
   iops        : min=3798766, max=3876458, avg=3859501.60, stdev=34036.42, samples=5
  lat (nsec)   : 50=99.89%, 100=0.01%, 250=0.10%, 500=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%
  cpu          : usr=99.83%, sys=0.13%, ctx=3, majf=0, minf=112
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=11570360,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=14.7GiB/s (15.8GB/s), 14.7GiB/s-14.7GiB/s (15.8GB/s-15.8GB/s), io=44.1GiB (47.4GB), run=3000-3000msec
root@localhost:~/fio-dev/1746/fio-canonical# head test_bw.1.log
999, 122248957, 0, 0, 0
999, 35008547, 0, 0, 0
999, 117028571, 0, 0, 0
999, 117028571, 0, 0, 0
999, 124121212, 0, 0, 0
999, 128000000, 0, 0, 0
999, 124121212, 0, 0, 0
999, 117028571, 0, 0, 0
999, 117028571, 0, 0, 0
999, 124121212, 0, 0, 0
Rogach commented 2 months ago

No, I only tested it on the latest version. Do you want me to bisect it?

vincentkfu commented 2 months ago

I think the issue is in iolog.h:

static inline bool per_unit_log(struct io_log *log)
{
        return log && (!log->avg_msec || log->log_gz || log->log_gz_store);
}

Try reverting 836365452f6525a7d670205b823f7982de48f844 and let me know if it works. The only problem is that I'm not sure what issues that patch fixed, so I don't know what harm would be caused by reverting it.

Rogach commented 2 months ago

I reverted that commit, and got a segfault :) Seems that patch did have some purpose.

vincentkfu commented 2 months ago

gdb says that the segfault occurs at:

#0  all_sw_idle (wq=0x7f12fbcb7b98) at workqueue.c:76
76          if (!(sw->flags & SW_F_IDLE))
[Current thread is 1 (Thread 0x7f12fbc896c0 (LWP 4449))]
(gdb) list
71      int i;
72  
73      for (i = 0; i < wq->max_workers; i++) {
74          struct submit_worker *sw = &wq->workers[i];
75  
76          if (!(sw->flags & SW_F_IDLE))
77              return false;
78      }
79  
80      return true;