Closed pppcn closed 1 month ago
What is missing? What should be there? We did have some relatively recent fixes for logging. Give 3.37 a try.
As log_avg_msec is set to 1000 in the test, we would expect to see 1 statistical data entry every 1000 milliseconds (1 second). However, the content in the file test_iops.2.log does not reflect this expected behavior.
1000, 107791, 1, 0, 0 1999, 108224, 1, 0, 0 3000, 107648, 1, 0, 0 4000, 107609, 1, 0, 0 4999, 106441, 1, 0, 0 6000, 107173, 1, 0, 0 7000, 107074, 1, 0, 0 8000, 107951, 1, 0, 0 9000, 107494, 1, 0, 0 10000, 108467, 1, 0, 0 11000, 108612, 1, 0, 0 12000, 108767, 1, 0, 0 13000, 108076, 1, 0, 0 14000, 108128, 1, 0, 0 15000, 108384, 1, 0, 0 16000, 107968, 1, 0, 0 17000, 108026, 1, 0, 0 18000, 107858, 1, 0, 0 19000, 109313, 1, 0, 0 20000, 108776, 1, 0, 0 21000, 108878, 1, 0, 0 22000, 107829, 1, 0, 0 23000, 107748, 1, 0, 0 24000, 107136, 1, 0, 0 25000, 107899, 1, 0, 0 26000, 107458, 1, 0, 0 27000, 107758, 1, 0, 0 28000, 107680, 1, 0, 0 29000, 107322, 1, 0, 0 30000, 107046, 1, 0, 0 30999, 107374, 1, 0, 0 32000, 108399, 1, 0, 0 33000, 108505, 1, 0, 0 34000, 106977, 1, 0, 0 35000, 107596, 1, 0, 0 36000, 107996, 1, 0, 0 37000, 108093, 1, 0, 0 38000, 107884, 1, 0, 0 39000, 109088, 1, 0, 0 40000, 107998, 1, 0, 0 41000, 108787, 1, 0, 0 42000, 108968, 1, 0, 0 43000, 107579, 1, 0, 0 44000, 107222, 1, 0, 0 45000, 107457, 1, 0, 0 46000, 107467, 1, 0, 0 47000, 109153, 1, 0, 0 48000, 107648, 1, 0, 0 49000, 107556, 1, 0, 0 50000, 108225, 1, 0, 0 51000, 107766, 1, 0, 0 52000, 84450, 1, 0, 0 279000, 47253, 1, 0, 0 279999, 178973, 1, 0, 0 281000, 182784, 1, 0, 0 282000, 178176, 1, 0, 0 283000, 179530, 1, 0, 0 284000, 180968, 1, 0, 0 285000, 179496, 1, 0, 0 286000, 173296, 1, 0, 0 286999, 183130, 1, 0, 0 288000, 179867, 1, 0, 0 289000, 180667, 1, 0, 0 290000, 184327, 1, 0, 0 291000, 181136, 1, 0, 0 292000, 177807, 1, 0, 0 293000, 180733, 1, 0, 0 294000, 180876, 1, 0, 0 295000, 179738, 1, 0, 0 296000, 179705, 1, 0, 0 297000, 180431, 1, 0, 0 298000, 175512, 1, 0, 0 299000, 178391, 1, 0, 0 300000, 183051, 1, 0, 0 301000, 182709, 1, 0, 0 301999, 179646, 1, 0, 0
Additionally, I am using 32 jobs, resulting in 32 log files named test_iops.1.log through test_iops.32.log. Upon running the fio_generate_plots command, it is observed that many of the log files are empty, lacking any statistical data entries.
Sorry. I just realized that I was using FIO 3.37 for the test, I had updated it before the test.
Fio will omit log entries when the bw or IOPS is zero. Can you do some investigating to see if entries are missing because they are zero? Here are some ways to do this:
group_reporting
and calculate the bytes written based on one BW log and see if it is close to what the final summary output reports.It would be easier to troubleshoot with a simpler job. Does this still happen with 16 jobs? 4 jobs? Different settings for offset
and offset_increment
? Try dropping offset=0z
because that is the default and you already have offset_increment
set.
Run with --debug=zns,io
on a much smaller and simpler job to make sure that fio is doing what you think it is doing,
Yes. The missing timestamps are the same in the bw and IOPS logs.
After removing group_reporting and offset=0z, and setting numjobs=2 and ramp_time=0, there are only 4 records in the bandwidth log file for job1 and 212 records for job2.
The capacity for each zone is 1GiB and the size for each job is set to 200z. Since ramp_time is set to 0, each job is expected to write 200GiB. However, the results for job1 show that it only wrote 3072MiB of data.
Here are the scripts.
[global]
filename=${DUT}
ioengine=libaio
direct=1
ba=4k
norandommap
randrepeat=0
continue_on_error=none
#group_reporting
percentile_list=99:99.9:99.99:99.999:99.9999
#time_based=1
#runtime=30m
log_avg_msec=1000
write_bw_log=${logdir}
write_iops_log=${logdir}
write_lat_log=${logdir}
zonemode=zbd
zonesize=1073741824
max_open_zones=32
job_max_open_zones=1
[ZNSWrite]
numjobs=2
bs=4k
rw=write
iodepth=256
#runtime=10m
ramp_time=0
loops=1
#offset=0z
offset_increment=8z
size=200z
Your evidence suggests that fio's logging feature is working as intended but fio stops writing after the first three zones for the first job. What happens if you omit job_max_open_zones
?
Sorry, I'm currently occupied with another task. Will get back to you ASAP.
Closing due to lack of follow up.
Please acknowledge the following before creating a ticket
Description of the bug: The statistical data records in the log file are incomplete, with many entries being omitted. Additionally, most of the log files have no entries at all.
Environment: centos 7, kernel version: 6.9.6-1.el7.elrepo.x86_64, a ZNS drive
fio version: fio 3.36
Reproduction steps
run this script with mq-deadline io scheduler.
[global] filename=${DUT} ioengine=libaio direct=1 ba=4k norandommap randrepeat=0 continue_on_error=none group_reporting percentile_list=99:99.9:99.99:99.999:99.9999 log_avg_msec=1000 write_bw_log=logs/test write_iops_log=logs/test write_lat_log=logs/test zonemode=zbd zonesize=xxx max_open_zones=32 job_max_open_zones=1
[ZNSWrite] numjobs=32 bs=4k rw=write iodepth=256
runtime=10m
ramp_time=30 loops=1 offset=0z offset_increment=8z size=200z