axboe / fio

Flexible I/O Tester
GNU General Public License v2.0
5.21k stars 1.26k forks source link

Missing entries in log files #1782

Closed pppcn closed 1 month ago

pppcn commented 3 months ago

Please acknowledge the following before creating a ticket

Description of the bug: The statistical data records in the log file are incomplete, with many entries being omitted. Additionally, most of the log files have no entries at all. 图片

Environment: centos 7, kernel version: 6.9.6-1.el7.elrepo.x86_64, a ZNS drive

fio version: fio 3.36

Reproduction steps

run this script with mq-deadline io scheduler.

[global] filename=${DUT} ioengine=libaio direct=1 ba=4k norandommap randrepeat=0 continue_on_error=none group_reporting percentile_list=99:99.9:99.99:99.999:99.9999 log_avg_msec=1000 write_bw_log=logs/test write_iops_log=logs/test write_lat_log=logs/test zonemode=zbd zonesize=xxx max_open_zones=32 job_max_open_zones=1

[ZNSWrite] numjobs=32 bs=4k rw=write iodepth=256

runtime=10m

ramp_time=30 loops=1 offset=0z offset_increment=8z size=200z

vincentkfu commented 3 months ago

What is missing? What should be there? We did have some relatively recent fixes for logging. Give 3.37 a try.

pppcn commented 3 months ago

As log_avg_msec is set to 1000 in the test, we would expect to see 1 statistical data entry every 1000 milliseconds (1 second). However, the content in the file test_iops.2.log does not reflect this expected behavior.

1000, 107791, 1, 0, 0 1999, 108224, 1, 0, 0 3000, 107648, 1, 0, 0 4000, 107609, 1, 0, 0 4999, 106441, 1, 0, 0 6000, 107173, 1, 0, 0 7000, 107074, 1, 0, 0 8000, 107951, 1, 0, 0 9000, 107494, 1, 0, 0 10000, 108467, 1, 0, 0 11000, 108612, 1, 0, 0 12000, 108767, 1, 0, 0 13000, 108076, 1, 0, 0 14000, 108128, 1, 0, 0 15000, 108384, 1, 0, 0 16000, 107968, 1, 0, 0 17000, 108026, 1, 0, 0 18000, 107858, 1, 0, 0 19000, 109313, 1, 0, 0 20000, 108776, 1, 0, 0 21000, 108878, 1, 0, 0 22000, 107829, 1, 0, 0 23000, 107748, 1, 0, 0 24000, 107136, 1, 0, 0 25000, 107899, 1, 0, 0 26000, 107458, 1, 0, 0 27000, 107758, 1, 0, 0 28000, 107680, 1, 0, 0 29000, 107322, 1, 0, 0 30000, 107046, 1, 0, 0 30999, 107374, 1, 0, 0 32000, 108399, 1, 0, 0 33000, 108505, 1, 0, 0 34000, 106977, 1, 0, 0 35000, 107596, 1, 0, 0 36000, 107996, 1, 0, 0 37000, 108093, 1, 0, 0 38000, 107884, 1, 0, 0 39000, 109088, 1, 0, 0 40000, 107998, 1, 0, 0 41000, 108787, 1, 0, 0 42000, 108968, 1, 0, 0 43000, 107579, 1, 0, 0 44000, 107222, 1, 0, 0 45000, 107457, 1, 0, 0 46000, 107467, 1, 0, 0 47000, 109153, 1, 0, 0 48000, 107648, 1, 0, 0 49000, 107556, 1, 0, 0 50000, 108225, 1, 0, 0 51000, 107766, 1, 0, 0 52000, 84450, 1, 0, 0 279000, 47253, 1, 0, 0 279999, 178973, 1, 0, 0 281000, 182784, 1, 0, 0 282000, 178176, 1, 0, 0 283000, 179530, 1, 0, 0 284000, 180968, 1, 0, 0 285000, 179496, 1, 0, 0 286000, 173296, 1, 0, 0 286999, 183130, 1, 0, 0 288000, 179867, 1, 0, 0 289000, 180667, 1, 0, 0 290000, 184327, 1, 0, 0 291000, 181136, 1, 0, 0 292000, 177807, 1, 0, 0 293000, 180733, 1, 0, 0 294000, 180876, 1, 0, 0 295000, 179738, 1, 0, 0 296000, 179705, 1, 0, 0 297000, 180431, 1, 0, 0 298000, 175512, 1, 0, 0 299000, 178391, 1, 0, 0 300000, 183051, 1, 0, 0 301000, 182709, 1, 0, 0 301999, 179646, 1, 0, 0

pppcn commented 3 months ago

Additionally, I am using 32 jobs, resulting in 32 log files named test_iops.1.log through test_iops.32.log. Upon running the fio_generate_plots command, it is observed that many of the log files are empty, lacking any statistical data entries.

pppcn commented 3 months ago

Sorry. I just realized that I was using FIO 3.37 for the test, I had updated it before the test.

vincentkfu commented 3 months ago

Fio will omit log entries when the bw or IOPS is zero. Can you do some investigating to see if entries are missing because they are zero? Here are some ways to do this:

It would be easier to troubleshoot with a simpler job. Does this still happen with 16 jobs? 4 jobs? Different settings for offset and offset_increment? Try dropping offset=0z because that is the default and you already have offset_increment set.

Run with --debug=zns,io on a much smaller and simpler job to make sure that fio is doing what you think it is doing,

pppcn commented 3 months ago

Yes. The missing timestamps are the same in the bw and IOPS logs.

After removing group_reporting and offset=0z, and setting numjobs=2 and ramp_time=0, there are only 4 records in the bandwidth log file for job1 and 212 records for job2. image image

The capacity for each zone is 1GiB and the size for each job is set to 200z. Since ramp_time is set to 0, each job is expected to write 200GiB. However, the results for job1 show that it only wrote 3072MiB of data. image

pppcn commented 3 months ago

Here are the scripts.

[global]
filename=${DUT}     
ioengine=libaio          
direct=1             
ba=4k              
norandommap
randrepeat=0
continue_on_error=none
#group_reporting
percentile_list=99:99.9:99.99:99.999:99.9999
#time_based=1
#runtime=30m
log_avg_msec=1000
write_bw_log=${logdir}
write_iops_log=${logdir}
write_lat_log=${logdir}
zonemode=zbd
zonesize=1073741824
max_open_zones=32
job_max_open_zones=1

[ZNSWrite]
numjobs=2
bs=4k
rw=write
iodepth=256
#runtime=10m
ramp_time=0
loops=1
#offset=0z
offset_increment=8z
size=200z
vincentkfu commented 3 months ago

Your evidence suggests that fio's logging feature is working as intended but fio stops writing after the first three zones for the first job. What happens if you omit job_max_open_zones?

pppcn commented 2 months ago

Sorry, I'm currently occupied with another task. Will get back to you ASAP.

vincentkfu commented 1 month ago

Closing due to lack of follow up.