Multiple entries with identical timestamps in latency log files.

ndokos commented 4 years ago

This is similar to issue #739 which was closed because of lack of response, so I started there, but it is different enough for @sitsofe to suggest that I open a new one. As a result of that conversation, I redid the run with a different clocksource (clock_gettime) with the results from that presented in this issue. See issue #739 for similar results with clocksource=gettimeofday.

I do a simple run with:

 /usr/local/bin/fio  --output-format=json ./fio.job

The version (/usr/local/bin/fio --version) is fio-3.18 and the job file looks like this:

[global]
bs=4k
runtime=60
ioengine=libaio
iodepth=32
direct=0
sync=0
time_based=1
clocksource=clock_gettime
ramp_time=5
write_bw_log=fio
write_iops_log=fio
write_lat_log=fio
log_avg_msec=1000
write_hist_log=fio
log_hist_msec=10000

[job-/home/pbench/tmp/foo]
filename=/home/pbench/tmp/foo
rw=rw
size=4096M
numjobs=4

IIUC, the log_avg_msec setting will produce as output the average latency over a 1-sec window, with a timestamp just after the end of the window. And since I'm averaging over one-second windows, I expect the timestamps to be increasing by roughly 1 sec between entries.

But that's not what I see: not only do I get entries that are much closer than 1 sec (see timestamps 38650 and 38659 below), but I also see multiple entries of the same type with identical timestamps. Here's e.g.. a portion of fio_lat.2.log:

...
38650, 727679, 0, 0, 0
38650, 560363, 1, 0, 0
38659, 1272794234, 0, 0, 0
38659, 1281573520, 1, 0, 0
39007, 9537421, 0, 0, 0
39007, 9523507, 1, 0, 0
39007, 66159046, 0, 0, 0
39007, 66161783, 1, 0, 0
40004, 606922, 0, 0, 0
40004, 704317, 1, 0, 0
40004, 11203647, 0, 0, 0
40004, 11204157, 1, 0, 0
41015, 600578, 0, 0, 0
41015, 573933, 1, 0, 0
41015, 33171441, 0, 0, 0
41015, 33171648, 1, 0, 0
42010, 658540, 0, 0, 0
42010, 634082, 1, 0, 0
42010, 33200410, 1, 0, 0
43042, 558504, 0, 0, 0
43042, 662542, 1, 0, 0
43042, 44161972, 1, 0, 0
44038, 612407, 0, 0, 0
44038, 596941, 1, 0, 0
44060, 110477157, 0, 0, 0
44060, 110511712, 1, 0, 0

...

In some cases (e.g. for the timestamp 44038) I get two entries, one for read and one for write: that's what I expect. But in other cases (e.g for timestamp 42010) I get three entries: there are two entries for type 1 (with the avg latency off by 2 orders of magnitude between them). In others (e.g. for timestamp 41015), I get four entries, two for type 0 and two for type 1, also with very different average latency values.

I should mention that this happens with all latency logs (lat, clat, slat). This particular run was done on a RHEL8 VM, but I have seen similar results on RHEL7 VMs and a Fedora31 laptop (with an SSD).

sitsofe commented 4 years ago

@ndokos Other things that would help:

Can you remove as much from the jobfile as you can but leave the bare essential pieces that still allow the issue to happen - e.g. are write_hist_log/log_hist_msec necessary to make the issue happen? How about when numjobs=1? How about when rw=write?
Does the issue happen with fio-3.19? How about fio git master?

ndokos commented 4 years ago

I reduced the job file to the following:

[global]
bs=4k
runtime=60
ioengine=libaio
iodepth=32
direct=0
sync=0
time_based=1
clocksource=clock_gettime
ramp_time=5
write_bw_log=fio
write_iops_log=fio
write_lat_log=fio
log_avg_msec=1000

[job-/home/pbench/tmp/foo]
filename=/home/pbench/tmp/foo
rw=rw
size=4096M

and I tried with 3.19. I still see the same problem. In fact, in one case I saw nine instances of the same timestamp:

...
19141, 159917, 0, 0, 0
19141, 1989, 1, 0, 0
19141, 69412, 0, 0, 0
19141, 14836, 1, 0, 0
19141, 3073, 1, 0, 0
19141, 2857, 1, 0, 0
19141, 3404, 0, 0, 0
19141, 1845, 0, 0, 0
19141, 2950, 1, 0, 0
...

I'll try git master but it'll take me a couple of days to get there. If you think I should simplify the job file further, let me know.

In the meantime, I'd be interested to see if you can reproduce this (on any system).

Thanks!

ndokos commented 4 years ago

I do NOT see it when I set rw=write, except in one instance at the last point:

60000, 198361, 1, 0, 0
60000, 91099, 1, 0, 0

This was in fio_lat.1.log.

sitsofe commented 4 years ago

Can

direct=0
sync=0

Be removed as I believe those are at their defaults? What about

ramp_time=5
write_bw_log=fio
write_iops_log=fio

? Even

ioengine=libaio

(although you'll get lower speeds)?

ndokos commented 4 years ago

Following your suggestions, I cut it down to this:

bs=4k
runtime=60
iodepth=32
time_based=1
clocksource=clock_gettime
write_lat_log=fio
log_avg_msec=1000

[job-/home/pbench/tmp/foo]
filename=/home/pbench/tmp/foo
rw=rw
size=4096M
numjobs=4

I still see things like this:

...
53031, 6349, 0, 0, 0
53031, 8307, 1, 0, 0
53031, 2293, 0, 0, 0
53031, 1452, 1, 0, 0
54032, 8757, 0, 0, 0
54032, 5675, 1, 0, 0
54032, 3844, 1, 0, 0
55000, 6546, 0, 0, 0
55000, 7011, 1, 0, 0
55000, 50660, 0, 0, 0
55000, 2289, 1, 0, 0
56008, 19375, 0, 0, 0
56008, 11927, 1, 0, 0
56075, 67185554, 0, 0, 0
57019, 12128, 0, 0, 0
57019, 12350, 1, 0, 0
57019, 2818, 0, 0, 0
58088, 19140, 0, 0, 0
58088, 24556, 1, 0, 0
58088, 3202, 0, 0, 0
59011, 3997, 0, 0, 0
59011, 11049, 1, 0, 0
59011, 2937, 1, 0, 0
60010, 5572, 0, 0, 0
60010, 6541, 1, 0, 0

ndokos commented 4 years ago

Reducing the fio.job further to this:

[global]
bs=4k
runtime=60
iodepth=32
time_based=1
clocksource=clock_gettime
write_lat_log=fio
log_avg_msec=1000

[job-/home/pbench/tmp/foo]
filename=/home/pbench/tmp/foo
rw=rw

I still see the problem:

...18102, 13756, 0, 0, 0
18102, 1589, 1, 0, 0
18102, 10106, 1, 0, 0
21586, 20105, 0, 0, 0
21586, 1624, 1, 0, 0
21586, 32076, 0, 0, 0
21586, 6926, 1, 0, 0
21586, 2088, 1, 0, 0
21586, 3116, 1, 0, 0
21586, 1632, 0, 0, 0
22000, 3871, 0, 0, 0
22000, 1618, 1, 0, 0
22000, 1338, 0, 0, 0
22000, 1491, 1, 0, 0
23002, 3095, 0, 0, 0
23002, 1573, 1, 0, 0
23002, 5464, 1, 0, 0
24000, 14601, 0, 0, 0
24000, 1603, 1, 0, 0
24000, 1284, 0, 0, 0
24000, 1584, 1, 0, 0
25000, 12906, 0, 0, 0
25000, 1613, 1, 0, 0
25000, 1391, 0, 0, 0
...

although this time I don't see the wildly different latencies reported in most (all?) of the other cases above (well, there are still factors of 20, but not factors of 200).

ndokos commented 4 years ago

The above is still with fio-3.19, I will try the latest bits from master probably tomorrow.

vincentkfu commented 4 years ago

I did a git bisect and believe this bug was introduced in fio-3.18 by commit b2a432bfbb6d10e93f2c8f8092d6db672d45af0d

Author: Phillip Chen phillip.a.chen@seagate.com Date: Wed Jan 22 16:28:38 2020 -0700

Per-command priority: Priority logging and libaio/io_uring cmdprio_percentage

Add cmdprio_percentage option to libaio and io_uring engines to set
ioprio on a per-command basis. Add tracking of high priority
commands to be displayed separately in human readable and JSON
outputs.

With 7a4e480d8875a31bee6f2a39c87c25e276cec616 (the parent of b2a432bfbb6d10e93f2c8f8092d6db672d45af0d), I can run the below command and obtain the below log

$ ./fio-canonical/fio --name=test --time_based --runtime=10s --write_lat_log=fio-06-7a4e48 --log_avg_msec=1000 --size=1G --rw=rw
test: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.17-92-g7a4e4
Starting 1 process
...
$ cat fio-06-7a4e48_clat.1.log 
1001, 5625, 0, 0
1001, 2499, 1, 0
2000, 2530, 1, 0
2000, 4498, 0, 0
3000, 2097, 1, 0
3000, 4247, 0, 0
4000, 5989, 0, 0
4000, 1897, 1, 0
5000, 5677, 0, 0
5000, 1891, 1, 0
6000, 6037, 0, 0
6000, 1896, 1, 0
7186, 6343, 0, 0
7186, 1907, 1, 0
8000, 6309, 0, 0
8000, 1921, 1, 0
9000, 5951, 0, 0
9000, 2114, 1, 0
10000, 6604, 0, 0
10000, 1984, 1, 0

But with b2a432bfbb6d10e93f2c8f8092d6db672d45af0d I run the same command and obtain

$ ./fio-canonical/fio --name=test --time_based --runtime=10s --write_lat_log=fio-07-b2a432 --log_avg_msec=1000 --size=1G --rw=rw
test: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.17-93-gb2a4
Starting 1 process
...
$ cat fio-07-b2a432_clat.1.log 
1000, 5851, 0, 0, 0
1000, 2551, 1, 0, 0
1000, 5028, 1, 0, 0
2000, 4175, 0, 0, 0
2000, 3214, 1, 0, 0
2000, 60619, 0, 0, 0
3003, 4219, 0, 0, 0
3003, 2644, 1, 0, 0
3003, 63036, 1, 0, 0
4000, 15857, 0, 0, 0
4000, 1923, 1, 0, 0
4000, 5135, 1, 0, 0
5000, 6188, 0, 0, 0
5000, 2123, 1, 0, 0
5000, 780, 0, 0, 0
5000, 1446, 1, 0, 0
6000, 5779, 0, 0, 0
6000, 2014, 1, 0, 0
6000, 6620, 1, 0, 0
7000, 5838, 0, 0, 0
7000, 2017, 1, 0, 0
7000, 780, 0, 0, 0
7000, 1455, 1, 0, 0
8000, 5964, 0, 0, 0
8000, 1904, 1, 0, 0
8000, 905, 0, 0, 0
8000, 1469, 1, 0, 0
9000, 6211, 0, 0, 0
9000, 1879, 1, 0, 0
9000, 1023, 0, 0, 0
10000, 7006, 0, 0, 0
10000, 2171, 1, 0, 0

vincentkfu commented 4 years ago

The only significant change to the logging code in b2a432bfbb6d10e93f2c8f8092d6db672d45af0d is now at https://github.com/axboe/fio/blob/983319d02626347d178b34320ac5835ae7ecad02/stat.c#L2752

@@ -2700,7 +2847,7 @@ static unsigned long add_log_sample(struct thread_data *td,
                        return diff;
        }

-       __add_stat_to_log(iolog, ddir, elapsed, td->o.log_max != 0);
+       _add_stat_to_log(iolog, elapsed, td->o.log_max != 0, priority_bit);

        iolog->avg_last[ddir] = elapsed - (this_window - iolog->avg_msec);
        return iolog->avg_msec;

Reverting _add_stat_to_log back to __add_stat_to_log (one underscore back to two underscores) fixes the problem for me but I would welcome more testing:

$ git diff
diff --git a/stat.c b/stat.c
index d8c01d14..efa811d2 100644
--- a/stat.c
+++ b/stat.c
@@ -2749,7 +2749,7 @@ static unsigned long add_log_sample(struct thread_data *td,
                        return diff;
        }

-       _add_stat_to_log(iolog, elapsed, td->o.log_max != 0, priority_bit);
+       __add_stat_to_log(iolog, ddir, elapsed, td->o.log_max != 0, priority_bit);

        iolog->avg_last[ddir] = elapsed - (this_window - iolog->avg_msec);
        return iolog->avg_msec;

$ ./fio-canonical/fio-tipplus --name=test --time_based --runtime=60s --write_lat_log=fio-tipplus --log_avg_msec=1000 --size=1G --rw=rw
test: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.19-14-g9833-dirty
Starting 1 process
...
$ wc fio-tipplus_clat.1.log 
 120  600 2506 fio-tipplus_clat.1.log

The resulting log file has 120 lines which is what we expect for 60s, one line for reads and one for writes.

ndokos commented 4 years ago

Thanks! I will test with this change and report back here.

ndokos commented 4 years ago

@vincentkfu: this is a great improvement - it eliminates all duplicate timestamps except for the very last. E.g. in a short 10s run, I see this:

1000, 388968, 1, 0, 0
1000, 369207, 0, 0, 0
2000, 435953, 0, 0, 0
2000, 452103, 1, 0, 0
3362, 526478, 1, 0, 0
3362, 579011, 0, 0, 0
4000, 8460336, 1, 0, 0
4000, 7920513, 0, 0, 0
5000, 717717, 1, 0, 0
5000, 716121, 0, 0, 0
6000, 187822, 1, 0, 0
6000, 187627, 0, 0, 0
7002, 380489, 1, 0, 0
7002, 395400, 0, 0, 0
8489, 335470, 0, 0, 0
8489, 335221, 1, 0, 0
9002, 27359973, 0, 0, 0
9002, 24107737, 1, 0, 0
10004, 2954061, 1, 0, 0
10004, 3010561, 0, 0, 0
10004, 22078571, 0, 0, 0
10004, 21765668, 1, 0, 0

and this is happening in all clat and lat files, but not the slat files.

@sitsofe : should I open an issue for that, reopen this one, or not bother?

vincentkfu commented 4 years ago

I don't see extra entries for the final timestamp in my tests. All the logs have 20 lines in my 10-second runs.

$ ./fio-canonical/fio --name=test --ioengine=null --size=1G --rw=randrw --runtime=10s --time_based --write_lat_log=test --log_avg_msec=1000
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=null, iodepth=1
fio-3.19-17-g6463
Starting 1 process
...

$ wc -l test_clat.1.log 
20 test_clat.1.log
$ wc -l test_lat.1.log 
20 test_lat.1.log
$ cat test_clat.1.log 
1000, 17, 0, 0, 0
1000, 16, 1, 0, 0
2000, 17, 0, 0, 0
2000, 16, 1, 0, 0
3000, 18, 0, 0, 0
3000, 18, 1, 0, 0
4000, 17, 1, 0, 0
4000, 17, 0, 0, 0
5000, 17, 0, 0, 0
5000, 17, 1, 0, 0
6000, 17, 1, 0, 0
6000, 16, 0, 0, 0
7000, 17, 0, 0, 0
7000, 17, 1, 0, 0
8000, 16, 0, 0, 0
8000, 17, 1, 0, 0
9000, 16, 0, 0, 0
9000, 16, 1, 0, 0
10000, 17, 1, 0, 0
10000, 16, 0, 0, 0

axboe / fio

Multiple entries with identical timestamps in latency log files. #947