axboe / fio

Flexible I/O Tester
GNU General Public License v2.0
5.27k stars 1.26k forks source link

Fio writting twice the amount of data to the disk set by size #1726

Open gustavo16a opened 8 months ago

gustavo16a commented 8 months ago

Please acknowledge the following before creating a ticket

Description of the bug: When I issue a command to fio to write random data the amount of data written in the disk reported by its SMART infos is twice the size of the command. With sequential writing the issue doesn't happen. I was wondering if this is something related to the process of creation of the random data?

Environment: Windows 10

fio version: 3.36

Reproduction steps Check the SMART info of the disk before: smart1

issue the command to write 1GiB of random data: fio --name=test --directory=D\:\ --ioengine=windowsaio --rw=randwrite --bs=4M --numjobs=1 --iodepth=1 --direct=1 --end_fsync=1 --size=1G --thread I'm uploading the command result, in case that helps: command_result.txt

Check SMART info of the disk after the command: smart2

In my case, each unit of the total host writes is 32MiB written, so 601-538 = 63 -> 63*32 = 2016MiB, which is approximately 2GiB

I tried to overcome this using the options number_ios , io_size, file_size and so on. If I set number_ios=1, then I got aproximatelly to 1GiB, but if I set the numjobs to a higher value, for example 10, then I got 6.48GiB written what would be less than the expected 10GiB for 10 jobs of 1GiB size.

Update1: Just for info the created files from fio have 1GiB size, and if a copy and paste them on Windows the SMART info from the disk reports that 1GiB was written; so probably isn't something related to the files itself.

vincentkfu commented 7 months ago

If the file does not already exist fio lays it out before starting the workload. Do you still see the same increase in SMART values when the file already exists? Also, the amount of data written by the device (assuming it's an SSD) will be affected by garbage collection and to a smaller extent file system metadata. It's unreasonable to expect Total Host Writes to match the amount of writes issued by fio.

gustavo16a commented 7 months ago

If the file does not already exist fio lays it out before starting the workload. Do you still see the same increase in SMART values when the file already exists? Also, the amount of data written by the device (assuming it's an SSD) will be affected by garbage collection and to a smaller extent file system metadata. It's unreasonable to expect Total Host Writes to match the amount of writes issued by fio.

I'll answer for Garbage collection first, writings made on Garbage collection should not be displayed on Host writes since it's firmware related writings, so I believe it's not that, even more because as I said on update if I copy and paste a 1GiB file generated by fio on SMART it only reports a 1GiB increase on host writes.

About the files not existing before workload, yes they didn't exist there. So, layouting the file does make fio write twice the data?

vincentkfu commented 6 months ago

If the file does not already exist fio lays it out before starting the workload. Do you still see the same increase in SMART values when the file already exists? Also, the amount of data written by the device (assuming it's an SSD) will be affected by garbage collection and to a smaller extent file system metadata. It's unreasonable to expect Total Host Writes to match the amount of writes issued by fio.

I'll answer for Garbage collection first, writings made on Garbage collection should not be displayed on Host writes since it's firmware related writings, so I believe it's not that, even more because as I said on update if I copy and paste a 1GiB file generated by fio on SMART it only reports a 1GiB increase on host writes.

About the files not existing before workload, yes they didn't exist there. So, layouting the file does make fio write twice the data?

Ok, you are right about garbage collection.

Try your job on a pre-existing file and report back.

rustyscottweber commented 4 months ago

Can confirm that this is a problem while making calculations for and running traffic on block devices.

fio ./fio_job_aJyaQg.ini 
sdd: (g=0): rw=rw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=16
...
fio-3.35
Starting 32 processes
^Cbs: 32 (f=32): [W(32)][48.8%][w=4708MiB/s][w=18.8k IOPS][eta 03m:13s]
fio: terminating on signal 2

sdd: (groupid=0, jobs=32): err= 0: pid=144329: Tue Jun 18 19:46:32 2024
  write: IOPS=17.7k, BW=4415MiB/s (4629MB/s)(794GiB/184126msec); 0 zone resets
   bw (  MiB/s): min=  503, max= 7060, per=100.00%, avg=4420.06, stdev=33.13, samples=11744
   iops        : min= 2010, max=28239, avg=17676.62, stdev=132.53, samples=11744
  cpu          : usr=0.66%, sys=1.83%, ctx=5222904, majf=0, minf=7779
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,3251456,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=4415MiB/s (4629MB/s), 4415MiB/s-4415MiB/s (4629MB/s-4629MB/s), io=794GiB (852GB), run=184126-184126msec

Disk stats (read/write):
  sdd: ios=125/15094899, merge=0/0, ticks=21/310690819, in_queue=310690841, util=100.00%
[root@r3i1 fio_SrCMVn]# fio ./fio_job_aJyaQg.ini 
sdd: (g=0): rw=rw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=16
...
fio-3.35
Starting 32 processes
Jobs: 26 (f=26): [W(3),_(1),W(12),_(2),W(10),_(1),W(1),_(2)][76.5%][w=4763MiB/s][w=19.1k IOPS][eta 01m:3Jobs: 24 (f=24): [W(3),_(1),W(12),_(2),W(5),_(1),W(3),_(2),W(1),_(2)][76.8%][w=4734MiB/s][w=18.9k IOPS][Jobs: 21 (f=21): [W(3),_(1),W(12),_(2),W(2),_(1),W(1),_(2),W(3),_(5)][77.1%][w=4778MiB/s][w=19.1k IOPS][Jobs: 20 (f=19): [W(3),_(1),W(12),_(2),f(1),W(1),_(1),W(1),_(3),W(2),_(5)][77.5%][w=4772MiB/s][w=19.1k IJobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][78.0%][w=4889MiB/s][w=19.6k IOPS][Jobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][78.4%][w=5099MiB/s][w=20.4k IOPS][Jobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][78.7%][w=2443MiB/s][w=9771 IOPS][eJobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][78.9%][w=2175MiB/s][w=8698 IOPS][eJobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][79.2%][w=2453MiB/s][w=9812 IOPS][eJobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][79.5%][w=2859MiB/s][w=11.4k IOPS][Jobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][79.5%][w=730MiB/s][w=2920 IOPS][etJobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][79.8%][w=1166MiB/s][w=4664 IOPS][eJobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][79.8%][w=784MiB/s][w=3137 IOPS][etJobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][80.8%][w=4933MiB/s][w=19.7k IOPS][Jobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][81.0%][w=5137MiB/s][w=20.5k IOPS][Jobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][81.7%][w=4996MiB/s][w=20.0k IOPS][Jobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][82.2%][w=5164MiB/s][w=20.7k IOPS][Jobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)][82.7%][w=5052MiB/s][w=20Jobs: 18 (f=18): [WJobs: 18 (f=18): [W(3),_(1),W(12),_(3),W(1),_(1),W(1),_(3),W(1),_(6)]Jobs: 18 (Jobs: 4 (f=4): [_(2),W(1),_(1),W(1),_(1),W(1),_(4),W(1),_(20)][98.8%][w=4650MiB/s][w=18.6k IOPS][eta 00m:04s]                     
sdd: (groupid=0, jobs=32): err= 0: pid=144464: Tue Jun 18 19:54:22 2024
  write: IOPS=18.3k, BW=4579MiB/s (4801MB/s)(1490GiB/333265msec); 0 zone resets
   bw (  MiB/s): min=  641, max=21030, per=100.00%, avg=4874.20, stdev=58.19, samples=20034
   iops        : min= 2566, max=84117, avg=19491.20, stdev=232.75, samples=20034
  cpu          : usr=0.66%, sys=1.87%, ctx=12362056, majf=0, minf=10508
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,6103552,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=4579MiB/s (4801MB/s), 4579MiB/s-4579MiB/s (4801MB/s-4801MB/s), io=1490GiB (1600GB), run=333265-333265msec

Disk stats (read/write):
  sdd: ios=1661/27599212, merge=0/0, ticks=2174/522082715, in_queue=522084890, util=100.00%
cat ./fio_job_aJyaQg.ini 
[global]
direct=1
group_reporting=1
log_unix_epoch=1
[sdd]
filename=/dev/sdd
rwmixread=0
bs=262144
rw=readwrite
iodepth=16
end_fsync=1
numjobs=32
disable_bw=1
disable_lat=1
disable_clat=1
disable_slat=1
log_avg_msec=1000
ioengine=libaio
size=100%
lsblk /dev/sdd 
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sdd    8:48   0 46.6G  0 disk 

It looks like the total number of bytes being written is multiplied by the number of jobs where in previous versions the number of jobs would share the total number of bytes to write.

@gustavo16a, can you test whether or not this is consistent with your windows block device or test file?

vincentkfu commented 4 months ago

It looks like the total number of bytes being written is multiplied by the number of jobs where in previous versions the number of jobs would share the total number of bytes to write.

numjobs has never split the bytes specified by size. Perhaps you are thinking of nrfiles.

gustavo16a commented 1 month ago

Sorry guys for long time no response.

Ok, you are right about garbage collection.

Try your job on a pre-existing file and report back.

Regarding this, it seems that in pre-existing files this does not happen. Although, it's not useful to me because we have to delete the files created in our test. We accepted the behavior and move on with it.

Can confirm that this is a problem while making calculations for and running traffic on block devices. Disk stats (read/write): sdd: ios=1661/27599212, merge=0/0, ticks=2174/522082715, in_queue=522084890, util=100.00%

cat ./fio_job_aJyaQg.ini 
[global]
direct=1
group_reporting=1
log_unix_epoch=1
[sdd]
filename=/dev/sdd
rwmixread=0
bs=262144
rw=readwrite
iodepth=16
end_fsync=1
numjobs=32
disable_bw=1
disable_lat=1
disable_clat=1
disable_slat=1
log_avg_msec=1000
ioengine=libaio
size=100%
lsblk /dev/sdd 
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sdd    8:48   0 46.6G  0 disk 

It looks like the total number of bytes being written is multiplied by the number of jobs where in previous versions the number of jobs would share the total number of bytes to write.

@gustavo16a, can you test whether or not this is consistent with your windows block device or test file?

Regarding the mentioned above, I think the numjobs options just create the same workload the number of times specified in the parameter, as @vincentkfu mentioned. What I can say is that so far in block devices, at least in Linux I had no problem with duplication of data.