axboe / fio

Flexible I/O Tester
GNU General Public License v2.0
5.26k stars 1.26k forks source link

Error Messages - while increase runtime in FIO #1241

Open MoranSimchi opened 3 years ago

MoranSimchi commented 3 years ago

version: fio-3.27

I got some error messages, and running seems to be stuck

from the server:

[root@nc9145177 ~]# /root/fio-3.27//bin/fio --server
fio: server listening on 0.0.0.0,8765
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG
fio: failed handling cmd IOLOG

from the client:

[root@nc9145175 ~]# /root/fio-3.27//bin/fio --client 10.229.145.175 8k-r --client 10.229.145.177 8k-r-1
hostname=nc9145177.drm.lab.emc.com, be=0, 64-bit, os=Linux, arch=x86-64, fio=fio-3.27, flags=1
hostname=nc9145175.drm.lab.emc.com, be=0, 64-bit, os=Linux, arch=x86-64, fio=fio-3.27, flags=1
<nc9145177.drm.lab.emc.com> 8k-r: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
<nc9145175.drm.lab.emc.com> 8k-r: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
<nc9145177.drm.lab.emc.com> ...
<nc9145175.drm.lab.emc.com> ...
<nc9145177.drm.lab.emc.com> Starting 10 processes
<nc9145175.drm.lab.emc.com> Starting 10 processes
client <10.229.145.177>: timeout on SEND_ETAs]
client <10.229.145.175>: timeout on SEND_ETA
client <10.229.145.177>: timeout on SEND_ETA
client <10.229.145.175>: timeout on SEND_ETA
client <10.229.145.177>: timeout on SEND_ETA
client <10.229.145.175>: timeout on SEND_ETA
fio: client: unable to find matching tag (1474180)
fio: client: unable to find matching tag (14741c0)
fio: client: unable to find matching tag (14741c0)
fio: client: unable to find matching tag (14741c0)
client <10.229.145.177>: timeout on SEND_ETA
client <10.229.145.175>: timeout on SEND_ETA
client <10.229.145.177>: timeout on SEND_ETA
client <10.229.145.175>: timeout on SEND_ETA
fio: client: unable to find matching tag (1474180)
client <10.229.145.177>: timeout on SEND_ETA
fio: client: unable to find matching tag (1474180)
fio: client: unable to find matching tag (1474180)
client <10.229.145.175>: timeout on SEND_ETA
fio: client: unable to find matching tag (1473420)
fio: client: unable to find matching tag (1475190)
client <10.229.145.177>: timeout on SEND_ETA
fio: client: unable to find matching tag (1475190)
fio: client: unable to find matching tag (1473420)
client <10.229.145.175>: timeout on SEND_ETA
fio: client: unable to find matching tag (1475270)
client <10.229.145.177>: timeout on SEND_ETA
fio: client: unable to find matching tag (1475200)
fio: client: unable to find matching tag (1473420)
client <10.229.145.175>: timeout on SEND_ETA
client <10.229.145.177>: timeout on SEND_ETA
fio: client: unable to find matching tag (147d300)
fio: client: unable to find matching tag (1475200)
fio: client: unable to find matching tag (1475190)
client <10.229.145.177>: timeout on SEND_ETAs]
fio: client 10.229.145.177, timeout on cmd SEND_ETA
fio: client 10.229.145.177 timed out
fio: unknown client fd 3
client <10.229.145.175>: timeout on SEND_ETA
fio: client: unable to find matching tag (147d300)
[root@nc9145175 ~]# cat 8k-r
[global]
time_based
ioengine=libaio
numjobs=10
#thread=5

bs=8K
iodepth=256
direct=1 #boolean True
runtime=180
name=raw-randread
readwrite=randread
dedupe_percentage=2
buffer_compress_percentage=2

#logging
write_iops_log=foo
write_bw_log=foo
write_lat_log=foo
#write_hist_log

[8k-r]
filename=/dev/mapper/368ccf09800ac54253fc8b30d0713b817
filename=/dev/mapper/368ccf098004ae2486c34843354a22558
filename=/dev/mapper/368ccf098007427bedacdb07fa128570a
filename=/dev/mapper/368ccf09800b3e23a23dfe76db6eac04e
filename=/dev/mapper/368ccf09800adedd5f83fd87bf9702f4f
filename=/dev/mapper/368ccf09800a56f5b03664bbc56d84e93
filename=/dev/mapper/368ccf09800692a899f589cadf8485673
filename=/dev/mapper/368ccf0980044997f94a8d456f16631d7
filename=/dev/mapper/368ccf098006aaea5f56ef67fd19e354d
filename=/dev/mapper/368ccf098008079359ccf8d5d0c8b0527
filename=/dev/mapper/368ccf09800f01db73c85cc1d5e907a5c
filename=/dev/mapper/368ccf09800674181c8a7da01a259f51d
filename=/dev/mapper/368ccf09800e8ce83c0da857706f1e195
filename=/dev/mapper/368ccf09800475cc675abd32e372f90eb
sitsofe commented 3 years ago

Hi @MoranSimchi,

Are the devices you're playing back on the same as those that the trace was recorded on?

galuzan8 commented 3 years ago

@sitsofe I'm not sure I understand the question. the 368* devices are the ones uses when we get these errors.

MoranSimchi commented 3 years ago

when trying to run the configuration file 8k-r for over 180 seconds, fio getting those outputs

sitsofe commented 3 years ago

Does cutting down the job file help narrow the issue down. What are the minimum options needed that continue to reproduce the error? For example is dedupe_percentage necessary? Can readwrite just be read? If you don't write the log files does the error change etc? Try to remove every option that still allows the problem to occur (and don't stop at the first option that stops the problem from happening - put it back and then try and remove another one etc)...

MoranSimchi commented 3 years ago

I got the errors also without dedupe_percentage and buffer_compress_percentage parameters:

`[root@nc9145177 ~]# fio-3.27/bin/fio --client 10.229.145.175 file1 --client 10.229.145.177 file2
hostname=nc9145177.drm.lab.emc.com, be=0, 64-bit, os=Linux, arch=x86-64, fio=fio-3.27, flags=1
hostname=nc9145175.drm.lab.emc.com, be=0, 64-bit, os=Linux, arch=x86-64, fio=fio-3.27, flags=1
<nc9145177.drm.lab.emc.com> 8k-r: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
<nc9145175.drm.lab.emc.com> 8k-r: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
<nc9145175.drm.lab.emc.com> Starting 1 process
<nc9145177.drm.lab.emc.com> Starting 1 process
client <10.229.145.177>: timeout on SEND_ETA
client <10.229.145.175>: timeout on SEND_ETA
fio: client: unable to find matching tag (10a6110)
fio: client: unable to find matching tag (10a6150)
client <10.229.145.177>: timeout on SEND_ETA
client <10.229.145.175>: timeout on SEND_ETA
fio: client: unable to find matching tag (10a5420)
fio: client: unable to find matching tag (10a6110)
client <10.229.145.177>: timeout on SEND_ETA
client <10.229.145.175>: timeout on SEND_ETA
fio: client: unable to find matching tag (10a5420)
fio: client: unable to find matching tag (10a62e0)
client <10.229.145.177>: timeout on SEND_ETA
client <10.229.145.175>: timeout on SEND_ETA
fio: client: unable to find matching tag (10a62e0)
<nc9145175.drm.lab.emc.com>
<nc9145177.drm.lab.emc.com>
8k-r: (groupid=0, jobs=1): err= 0: pid=1337063: Wed Jul 14 10:02:12 2021
  read: IOPS=113k, BW=887MiB/s (930MB/s)(156GiB/180001msec)
    slat (usec): min=3, max=3196, avg= 6.72, stdev= 8.14
    clat (usec): min=48, max=38089, avg=2247.13, stdev=1603.41
     lat (usec): min=83, max=38094, avg=2253.94, stdev=1603.50
    clat percentiles (usec):
     |  1.00th=[  502],  5.00th=[ 1582], 10.00th=[ 1713], 20.00th=[ 1893],
     | 30.00th=[ 2008], 40.00th=[ 2073], 50.00th=[ 2147], 60.00th=[ 2180],
     | 70.00th=[ 2245], 80.00th=[ 2311], 90.00th=[ 2442], 95.00th=[ 2606],
     | 99.00th=[ 9634], 99.50th=[14746], 99.90th=[27132], 99.95th=[29230],
     | 99.99th=[31589]
   bw (  KiB/s): min=  215, max=167648, per=0.46%, avg=4167.81, stdev=2155.49, samples=20428905
   iops        : min=    1, max=    1, avg= 1.00, stdev= 0.00, samples=20428905
  lat (usec)   : 50=0.01%, 100=0.01%, 250=0.01%, 500=0.98%, 750=0.78%
  lat (usec)   : 1000=0.58%
  lat (msec)   : 2=26.28%, 4=69.92%, 10=0.46%, 20=0.81%, 50=0.18%
  cpu          : usr=17.18%, sys=67.26%, ctx=80810, majf=0, minf=529532
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=20428905,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=887MiB/s (930MB/s), 887MiB/s-887MiB/s (930MB/s-930MB/s), io=156GiB (167GB), run=180001-180001msec
8k-r: (groupid=0, jobs=1): err= 0: pid=1294675: Wed Jul 14 10:02:12 2021
  read: IOPS=115k, BW=899MiB/s (943MB/s)(158GiB/180001msec)
    slat (usec): min=3, max=3141, avg= 6.59, stdev= 7.36
    clat (usec): min=20, max=38629, avg=2215.40, stdev=1597.17
     lat (usec): min=225, max=38634, avg=2222.08, stdev=1597.26
    clat percentiles (usec):
     |  1.00th=[  502],  5.00th=[ 1549], 10.00th=[ 1680], 20.00th=[ 1860],
     | 30.00th=[ 1975], 40.00th=[ 2040], 50.00th=[ 2114], 60.00th=[ 2147],
     | 70.00th=[ 2212], 80.00th=[ 2311], 90.00th=[ 2442], 95.00th=[ 2606],
     | 99.00th=[ 8356], 99.50th=[14746], 99.90th=[26870], 99.95th=[29230],
     | 99.99th=[31851]
   bw (  KiB/s): min=  212, max=390485, per=0.46%, avg=4227.59, stdev=2148.51, samples=20722210
   iops        : min=    1, max=    1, avg= 1.00, stdev= 0.00, samples=20722210
  lat (usec)   : 50=0.01%, 100=0.01%, 250=0.01%, 500=0.98%, 750=0.79%
  lat (usec)   : 1000=0.58%
  lat (msec)   : 2=31.16%, 4=65.10%, 10=0.41%, 20=0.80%, 50=0.18%
  cpu          : usr=17.23%, sys=67.63%, ctx=82714, majf=0, minf=588074
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=20722210,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=899MiB/s (943MB/s), 899MiB/s-899MiB/s (943MB/s-943MB/s), io=158GiB (170GB), run=180001-180001msec
All clients: (groupid=0, jobs=2): err= 0: pid=0: Wed Jul 14 10:02:12 2021
  read: IOPS=229k, BW=1786Mi (1873M)(314GiB/180001msec)
    slat (usec): min=3, max=3196, avg= 6.59, stdev= 5.22
    clat (usec): min=20, max=38629, avg=2215.40, stdev=1133.39
     lat (usec): min=83, max=38634, avg=2222.08, stdev=1133.45
   bw (  KiB/s): min=  212, max=390485, per=0.23%, avg=4227.59, stdev=2148.51, samples=20722210
   iops        : min=    1, max=    1, avg= 1.00, stdev= 0.00, samples=20722210
  lat (usec)   : 50=0.01%, 100=0.01%, 250=0.01%, 500=0.98%, 750=0.78%
  lat (usec)   : 1000=0.58%
  lat (msec)   : 2=28.73%, 4=67.49%, 10=0.44%, 20=0.81%, 50=0.18%
  cpu          : usr=17.21%, sys=67.44%, ctx=163524, majf=0, minf=1117606
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=41151115,0,0,0 short=0,0,0,0 dropped=0,0,0,0
`

parameters I ran with:

`[global]
time_based
ioengine=libaio

bs=8K
iodepth=256
direct=1 #boolean True
runtime=180
name=raw-randread
readwrite=randread

#logging
write_iops_log=foo
write_bw_log=foo
write_lat_log=foo
#write_hist_log

[8k-r]
filename=/dev/mapper/368ccf09800bd089e4dfbc1d8596337f5
filename=/dev/mapper/368ccf098003de6422898066949156f3d
filename=/dev/mapper/368ccf09800686489a7c0a26172095c99
filename=/dev/mapper/368ccf0980068e9389e3f8aa48b75acf3
filename=/dev/mapper/368ccf09800a7e351235fcef1424096b4
filename=/dev/mapper/368ccf098003e6b13bca0c705952c64d9
filename=/dev/mapper/368ccf0980022e801a5fbc084c1f1f252
filename=/dev/mapper/368ccf098008bb33173f9bbddd181daaf
filename=/dev/mapper/368ccf0980050695ca8cfc220aa099f98
filename=/dev/mapper/368ccf098004756ead7a499a85da7b0e8
filename=/dev/mapper/368ccf0980059c9434a117cf9e244b225
filename=/dev/mapper/368ccf0980088bf10ab458c6c33f11357
filename=/dev/mapper/368ccf09800fde6432682a2edef192357
filename=/dev/mapper/368ccf0980017ff1ec98198d6edf1fe6a
`
MoranSimchi commented 3 years ago

I ran the same parameters above without write_iops_log=foo, write_bw_log=foo, write_lat_log=foo and didn't got any error

[global]
time_based
ioengine=libaio

bs=8K
iodepth=256
direct=1 #boolean True
runtime=180
name=raw-randread
readwrite=randread

[8k-r]
filename=/dev/mapper/368ccf09800bd089e4dfbc1d8596337f5
filename=/dev/mapper/368ccf098003de6422898066949156f3d
filename=/dev/mapper/368ccf09800686489a7c0a26172095c99
filename=/dev/mapper/368ccf0980068e9389e3f8aa48b75acf3
filename=/dev/mapper/368ccf09800a7e351235fcef1424096b4
filename=/dev/mapper/368ccf098003e6b13bca0c705952c64d9
filename=/dev/mapper/368ccf0980022e801a5fbc084c1f1f252
filename=/dev/mapper/368ccf098008bb33173f9bbddd181daaf
filename=/dev/mapper/368ccf0980050695ca8cfc220aa099f98
filename=/dev/mapper/368ccf098004756ead7a499a85da7b0e8
filename=/dev/mapper/368ccf0980059c9434a117cf9e244b225
filename=/dev/mapper/368ccf0980088bf10ab458c6c33f11357
filename=/dev/mapper/368ccf09800fde6432682a2edef192357
filename=/dev/mapper/368ccf0980017ff1ec98198d6edf1fe6a
sitsofe commented 3 years ago

@MoranSimchi If you keep just one of logging options does the error come back? Also is it there when you have just one filename line - I thought having the same option more than once just took the last one?