Open MoranSimchi opened 3 years ago
Hi @MoranSimchi,
Are the devices you're playing back on the same as those that the trace was recorded on?
@sitsofe I'm not sure I understand the question. the 368* devices are the ones uses when we get these errors.
when trying to run the configuration file 8k-r for over 180 seconds, fio getting those outputs
Does cutting down the job file help narrow the issue down. What are the minimum options needed that continue to reproduce the error? For example is dedupe_percentage
necessary? Can readwrite just be read? If you don't write the log files does the error change etc? Try to remove every option that still allows the problem to occur (and don't stop at the first option that stops the problem from happening - put it back and then try and remove another one etc)...
I got the errors also without dedupe_percentage and buffer_compress_percentage parameters:
`[root@nc9145177 ~]# fio-3.27/bin/fio --client 10.229.145.175 file1 --client 10.229.145.177 file2
hostname=nc9145177.drm.lab.emc.com, be=0, 64-bit, os=Linux, arch=x86-64, fio=fio-3.27, flags=1
hostname=nc9145175.drm.lab.emc.com, be=0, 64-bit, os=Linux, arch=x86-64, fio=fio-3.27, flags=1
<nc9145177.drm.lab.emc.com> 8k-r: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
<nc9145175.drm.lab.emc.com> 8k-r: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
<nc9145175.drm.lab.emc.com> Starting 1 process
<nc9145177.drm.lab.emc.com> Starting 1 process
client <10.229.145.177>: timeout on SEND_ETA
client <10.229.145.175>: timeout on SEND_ETA
fio: client: unable to find matching tag (10a6110)
fio: client: unable to find matching tag (10a6150)
client <10.229.145.177>: timeout on SEND_ETA
client <10.229.145.175>: timeout on SEND_ETA
fio: client: unable to find matching tag (10a5420)
fio: client: unable to find matching tag (10a6110)
client <10.229.145.177>: timeout on SEND_ETA
client <10.229.145.175>: timeout on SEND_ETA
fio: client: unable to find matching tag (10a5420)
fio: client: unable to find matching tag (10a62e0)
client <10.229.145.177>: timeout on SEND_ETA
client <10.229.145.175>: timeout on SEND_ETA
fio: client: unable to find matching tag (10a62e0)
<nc9145175.drm.lab.emc.com>
<nc9145177.drm.lab.emc.com>
8k-r: (groupid=0, jobs=1): err= 0: pid=1337063: Wed Jul 14 10:02:12 2021
read: IOPS=113k, BW=887MiB/s (930MB/s)(156GiB/180001msec)
slat (usec): min=3, max=3196, avg= 6.72, stdev= 8.14
clat (usec): min=48, max=38089, avg=2247.13, stdev=1603.41
lat (usec): min=83, max=38094, avg=2253.94, stdev=1603.50
clat percentiles (usec):
| 1.00th=[ 502], 5.00th=[ 1582], 10.00th=[ 1713], 20.00th=[ 1893],
| 30.00th=[ 2008], 40.00th=[ 2073], 50.00th=[ 2147], 60.00th=[ 2180],
| 70.00th=[ 2245], 80.00th=[ 2311], 90.00th=[ 2442], 95.00th=[ 2606],
| 99.00th=[ 9634], 99.50th=[14746], 99.90th=[27132], 99.95th=[29230],
| 99.99th=[31589]
bw ( KiB/s): min= 215, max=167648, per=0.46%, avg=4167.81, stdev=2155.49, samples=20428905
iops : min= 1, max= 1, avg= 1.00, stdev= 0.00, samples=20428905
lat (usec) : 50=0.01%, 100=0.01%, 250=0.01%, 500=0.98%, 750=0.78%
lat (usec) : 1000=0.58%
lat (msec) : 2=26.28%, 4=69.92%, 10=0.46%, 20=0.81%, 50=0.18%
cpu : usr=17.18%, sys=67.26%, ctx=80810, majf=0, minf=529532
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=20428905,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
READ: bw=887MiB/s (930MB/s), 887MiB/s-887MiB/s (930MB/s-930MB/s), io=156GiB (167GB), run=180001-180001msec
8k-r: (groupid=0, jobs=1): err= 0: pid=1294675: Wed Jul 14 10:02:12 2021
read: IOPS=115k, BW=899MiB/s (943MB/s)(158GiB/180001msec)
slat (usec): min=3, max=3141, avg= 6.59, stdev= 7.36
clat (usec): min=20, max=38629, avg=2215.40, stdev=1597.17
lat (usec): min=225, max=38634, avg=2222.08, stdev=1597.26
clat percentiles (usec):
| 1.00th=[ 502], 5.00th=[ 1549], 10.00th=[ 1680], 20.00th=[ 1860],
| 30.00th=[ 1975], 40.00th=[ 2040], 50.00th=[ 2114], 60.00th=[ 2147],
| 70.00th=[ 2212], 80.00th=[ 2311], 90.00th=[ 2442], 95.00th=[ 2606],
| 99.00th=[ 8356], 99.50th=[14746], 99.90th=[26870], 99.95th=[29230],
| 99.99th=[31851]
bw ( KiB/s): min= 212, max=390485, per=0.46%, avg=4227.59, stdev=2148.51, samples=20722210
iops : min= 1, max= 1, avg= 1.00, stdev= 0.00, samples=20722210
lat (usec) : 50=0.01%, 100=0.01%, 250=0.01%, 500=0.98%, 750=0.79%
lat (usec) : 1000=0.58%
lat (msec) : 2=31.16%, 4=65.10%, 10=0.41%, 20=0.80%, 50=0.18%
cpu : usr=17.23%, sys=67.63%, ctx=82714, majf=0, minf=588074
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=20722210,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
READ: bw=899MiB/s (943MB/s), 899MiB/s-899MiB/s (943MB/s-943MB/s), io=158GiB (170GB), run=180001-180001msec
All clients: (groupid=0, jobs=2): err= 0: pid=0: Wed Jul 14 10:02:12 2021
read: IOPS=229k, BW=1786Mi (1873M)(314GiB/180001msec)
slat (usec): min=3, max=3196, avg= 6.59, stdev= 5.22
clat (usec): min=20, max=38629, avg=2215.40, stdev=1133.39
lat (usec): min=83, max=38634, avg=2222.08, stdev=1133.45
bw ( KiB/s): min= 212, max=390485, per=0.23%, avg=4227.59, stdev=2148.51, samples=20722210
iops : min= 1, max= 1, avg= 1.00, stdev= 0.00, samples=20722210
lat (usec) : 50=0.01%, 100=0.01%, 250=0.01%, 500=0.98%, 750=0.78%
lat (usec) : 1000=0.58%
lat (msec) : 2=28.73%, 4=67.49%, 10=0.44%, 20=0.81%, 50=0.18%
cpu : usr=17.21%, sys=67.44%, ctx=163524, majf=0, minf=1117606
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=41151115,0,0,0 short=0,0,0,0 dropped=0,0,0,0
`
parameters I ran with:
`[global]
time_based
ioengine=libaio
bs=8K
iodepth=256
direct=1 #boolean True
runtime=180
name=raw-randread
readwrite=randread
#logging
write_iops_log=foo
write_bw_log=foo
write_lat_log=foo
#write_hist_log
[8k-r]
filename=/dev/mapper/368ccf09800bd089e4dfbc1d8596337f5
filename=/dev/mapper/368ccf098003de6422898066949156f3d
filename=/dev/mapper/368ccf09800686489a7c0a26172095c99
filename=/dev/mapper/368ccf0980068e9389e3f8aa48b75acf3
filename=/dev/mapper/368ccf09800a7e351235fcef1424096b4
filename=/dev/mapper/368ccf098003e6b13bca0c705952c64d9
filename=/dev/mapper/368ccf0980022e801a5fbc084c1f1f252
filename=/dev/mapper/368ccf098008bb33173f9bbddd181daaf
filename=/dev/mapper/368ccf0980050695ca8cfc220aa099f98
filename=/dev/mapper/368ccf098004756ead7a499a85da7b0e8
filename=/dev/mapper/368ccf0980059c9434a117cf9e244b225
filename=/dev/mapper/368ccf0980088bf10ab458c6c33f11357
filename=/dev/mapper/368ccf09800fde6432682a2edef192357
filename=/dev/mapper/368ccf0980017ff1ec98198d6edf1fe6a
`
I ran the same parameters above without write_iops_log=foo, write_bw_log=foo, write_lat_log=foo and didn't got any error
[global]
time_based
ioengine=libaio
bs=8K
iodepth=256
direct=1 #boolean True
runtime=180
name=raw-randread
readwrite=randread
[8k-r]
filename=/dev/mapper/368ccf09800bd089e4dfbc1d8596337f5
filename=/dev/mapper/368ccf098003de6422898066949156f3d
filename=/dev/mapper/368ccf09800686489a7c0a26172095c99
filename=/dev/mapper/368ccf0980068e9389e3f8aa48b75acf3
filename=/dev/mapper/368ccf09800a7e351235fcef1424096b4
filename=/dev/mapper/368ccf098003e6b13bca0c705952c64d9
filename=/dev/mapper/368ccf0980022e801a5fbc084c1f1f252
filename=/dev/mapper/368ccf098008bb33173f9bbddd181daaf
filename=/dev/mapper/368ccf0980050695ca8cfc220aa099f98
filename=/dev/mapper/368ccf098004756ead7a499a85da7b0e8
filename=/dev/mapper/368ccf0980059c9434a117cf9e244b225
filename=/dev/mapper/368ccf0980088bf10ab458c6c33f11357
filename=/dev/mapper/368ccf09800fde6432682a2edef192357
filename=/dev/mapper/368ccf0980017ff1ec98198d6edf1fe6a
@MoranSimchi If you keep just one of logging options does the error come back? Also is it there when you have just one filename line - I thought having the same option more than once just took the last one?
version: fio-3.27
I got some error messages, and running seems to be stuck
from the server:
from the client: