Open hongkailiu opened 6 years ago
There seem to be multiple problems, but one problem might be a race in the production of the latency files: they might need to be labeled by job id to avoid multiple jobs stomping on each other. @bengland2: does that sound plausible?
@ndokos @hongkailiu @atheurer no it doesn't seem right. The error was that there were zero histogram logs. If the multiple jobs stomped on each other then there would at least be 1. I'll try to talk to you today but Tim Wilkinson noticed some odd things about new pbench-fio version as well, as did I. Bear with me while I explain.
When I don't run pbench-fio RPM's fio in distributed mode, it works.
I installed pbench-fio-2.14-2 on my laptop and ran this command:
fio --name=randread bentest.job
Where bentest.fio job file contained:
[global]
time_based=1
runtime=5
write_hist_log=bentest
log_hist_msec=1000
directory=/home/bengland/ceph/fio/pbench-fio-2.14-2/run
filesize=64m
[randread]
rw=randread
bs=4k
numjobs=2
It generated these log files:
bentest_clat_hist.1.log bentest_clat_hist.2.log
All well and good. But then I try running the same job file with the --client option:
fio --client=localhost --name=randread bentest.job
And I get this output:
randread: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio: job file <../bentest.job> open: No such file or directory
[bengland@bene-laptop pbench-fio-2.14-2]$ fio --client=localhost --name=randread bentest.job
randread: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.14
<localhost> net 8324 server: got op [PROBE], pdu=136, tag=1128680
<localhost> net 8324 server: sending probe reply
hostname=bene-laptop, be=0, 64-bit, os=Linux, arch=x86-64, fio=fio-2.14, flags=1
<bene-laptop> net 8324 server: got op [JOB], pdu=192, tag=0
<bene-laptop> randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
<bene-laptop> ...
<bene-laptop> net 8324 server: got op [RUN], pdu=0, tag=0
<bene-laptop> net 8324 server: forked off job job (pid=8333)
<bene-laptop> Starting 2 processes
<bene-laptop> net 8324 server: got op [SEND_ETA], pdu=0, tag=1128680
<bene-laptop> net 8324 server sending status
<bene-laptop> net 8324 server: got op [SEND_ETA], pdu=0, tag=1128680
<bene-laptop> net 8324 server sending status
<bene-laptop> net 8324 server: got op [SEND_ETA], pdu=0, tag=1128680
<bene-laptop> net 8324 server sending status
<bene-laptop> net 8324 server: got op [SEND_ETA], pdu=0, tag=112868000m:02s]
<bene-laptop> net 8324 server sending status
<bene-laptop> net 8324 server: got op [SEND_ETA], pdu=0, tag=112868000m:01s]
<bene-laptop> net 8324 server sending status
Segmentation fault (core dumped)ne] [9850M/0K/0K /s] [2463K/0/0 iops] [eta 00m:00s]
So I wanted to look at this with gdb to see what was wrong, so I recompiled fio by disabling the -O3 option in Makefile (I commented out the _FORTIFY_SOURCE flag too because the compile complained about it). When I re-run it, it works and both log files are returned by pbench-fio. Ughh, what do I do now? I guess I can try same experiment with upstream fio and if that fails, then I can complain to upstream developers at least.
back to the optimized version -- when I run the optimized version with numjobs=1, I see a zero-length histogram log file. So this might explain above observation that with 1 job per host it at least didn't get a "no such file" error.
Just want to point out that when Tim and I used fio histogram logging successfully at large scale, we had to patch fio to NOT have fio --server processes copy logs back to the fio --client process, by disabling the line 1093 in iolog.c:
#if 0
if (td->client_type == FIO_CLIENT_TYPE_GUI || is_backend)
fio_send_iolog(td, log, log->filename);
else
flush_log(log, !td->o.per_job_logs);
#else
flush_log(log, !td->o.per_job_logs);
#endif
and changing the pbench-fio bash script like this:
--- /opt/pbench-agent/bench-scripts/pbench-fio 2016-09-26 14:13:06.547000000 +0000
+++ pbench-fio.ben 2016-12-02 03:06:42.107448525 +0000
@@ -403,6 +403,7 @@
ssh $ssh_opts $client ${pbench_install_dir}/bench-scripts/$script_name --remote-only --install &
done
wait
+ which pdsh || (debug_log "[$script_name] pdsh is not installed on test driver, exiting" ; exit 1)
fi
}
@@ -587,7 +588,9 @@
chmod +x $benchmark_results_dir/fio.cmd
debug_log "$benchmark: Going to run [$bench_cmd $bench_opts $client_opts]"
pushd $benchmark_results_dir >/dev/null
+ pdsh -R ssh -S -w ^$client_file 'rm -f $benchmark_results_dir/*.log'
$benchmark_results_dir/fio.cmd >$benchmark_results_dir/fio-result.txt
+ rpdcp -R ssh -w ^$client_file $benchmark_results_dir/*.log $benchmark_results_dir
popd >/dev/null
pbench-stop-tools --group=$tool_group --iteration=$iteration --dir=$benchmark_results_dir
if [ ! -z "$clients" ]; then
@@ -602,8 +605,6 @@
ssh $ssh_opts $client /bin/rm -f "$benchmark_results_dir/*.log" &
done
wait
- else
- mv $benchmark_results_dir/fio*.log $benchmark_results_dir/clients/localhost/ 2>/dev/null
fi
pbench-postprocess-tools --group=$tool_group --iteration=$iteration --dir=$benchmark_results_dir
echo "fio job complete"
upstream fio behaves the same way - with default compilation options it segfaults, but when you disable optimization then it works! With defaults:
[bengland@bene-laptop fio]$ ./fio --client=localhost --name=randread ../../pbench-fio-2.14-2/bentest.job
randread: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.2-24-gf273
hostname=bene-laptop, be=0, 64-bit, os=Linux, arch=x86-64, fio=fio-3.2-24-gf273, flags=1
<bene-laptop> randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
<bene-laptop> ...
<bene-laptop> Starting 2 processes
Segmentation fault (core dumped)9002M,w=0k][r=2251k,w=0 IOPS][eta 00m:00s]
With optimization turned off:
[bengland@bene-laptop fio]$ ./fio --client=localhost --name=randread ../../pbench-fio-2.14-2/bentest.job
randread: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.2-24-gf273-dirty
hostname=bene-laptop, be=0, 64-bit, os=Linux, arch=x86-64, fio=fio-3.2-24-gf273-dirty, flags=1
<bene-laptop> randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
<bene-laptop> ...
<bene-laptop> Starting 2 processes
randread: (groupid=0, jobs=1): err= 0: pid=25115: Wed Dec 13 10:07:40 2017
read: IOPS=462k, BW=1807MiB/s (1894MB/s)(9035MiB/5001msec)
clat (nsec): min=338, max=5965.0k, avg=1371.41, stdev=8798.85
lat (nsec): min=381, max=5967.3k, avg=1431.37, stdev=8806.48
clat percentiles (nsec):
| 1.00th=[ 516], 5.00th=[ 540], 10.00th=[ 556], 20.00th=[ 588],
| 30.00th=[ 604], 40.00th=[ 620], 50.00th=[ 644], 60.00th=[ 676],
| 70.00th=[ 740], 80.00th=[ 796], 90.00th=[ 868], 95.00th=[ 996],
| 99.00th=[ 3632], 99.50th=[ 82432], 99.90th=[108032], 99.95th=[117248],
| 99.99th=[136192]
bw ( MiB/s): min= 39, max= 2780, per=45.38%, avg=1685.63, stdev=1244.67, samples=9
iops : min=10040, max=711898, avg=431520.67, stdev=318634.60, samples=9
lat (nsec) : 500=0.24%, 750=71.06%, 1000=23.79%
lat (usec) : 2=3.29%, 4=0.73%, 10=0.16%, 20=0.03%, 50=0.01%
lat (usec) : 100=0.52%, 250=0.18%, 500=0.01%
lat (msec) : 10=0.01%
cpu : usr=42.68%, sys=28.94%, ctx=16393, majf=0, minf=20
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=2312941,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
randread: (groupid=0, jobs=1): err= 0: pid=25116: Wed Dec 13 10:07:40 2017
read: IOPS=489k, BW=1909MiB/s (2001MB/s)(9543MiB/5000msec)
clat (nsec): min=334, max=1117.1k, avg=1303.81, stdev=7677.98
lat (nsec): min=383, max=1118.3k, avg=1358.97, stdev=7686.19
clat percentiles (nsec):
| 1.00th=[ 510], 5.00th=[ 524], 10.00th=[ 540], 20.00th=[ 572],
| 30.00th=[ 588], 40.00th=[ 604], 50.00th=[ 620], 60.00th=[ 636],
| 70.00th=[ 668], 80.00th=[ 748], 90.00th=[ 860], 95.00th=[ 940],
| 99.00th=[ 3344], 99.50th=[ 82432], 99.90th=[107008], 99.95th=[115200],
| 99.99th=[136192]
bw ( MiB/s): min= 39, max= 2990, per=49.58%, avg=1841.99, stdev=1367.21, samples=9
iops : min=10054, max=765588, avg=471550.44, stdev=350004.44, samples=9
lat (nsec) : 500=0.37%, 750=79.56%, 1000=16.42%
lat (usec) : 2=2.09%, 4=0.72%, 10=0.13%, 20=0.03%, 50=0.01%
lat (usec) : 100=0.50%, 250=0.17%
lat (msec) : 2=0.01%
cpu : usr=41.50%, sys=30.30%, ctx=16421, majf=0, minf=19
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=2442986,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
All clients: (groupid=0, jobs=2): err= 0: pid=0: Wed Dec 13 10:07:40 2017
read: IOPS=0, BW=0 (0)(0B/5001msec)
clat (nsec): min=334, max=5965.0k, avg=1303.81, stdev=5502.88
lat (nsec): min=381, max=5967.3k, avg=1358.97, stdev=5508.76
bw ( MiB/s): min= 39, max= 2990, per=24.79%, avg=1841.99, stdev=937.90, samples=18
iops : min=10040, max=765588, avg=471550.44, stdev=240101.07, samples=18
lat (nsec) : 500=0.31%, 750=75.43%, 1000=20.00%
lat (usec) : 2=2.67%, 4=0.72%, 10=0.15%, 20=0.03%, 50=0.01%
lat (usec) : 100=0.51%, 250=0.18%, 500=0.01%
lat (msec) : 2=0.01%, 10=0.01%
cpu : usr=42.09%, sys=29.62%, ctx=32814, majf=0, minf=39
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=4755927,0,0, short=0,0,0, dropped=0,0,0
Run status group 0 (all jobs):
READ: bw=3895276343B/s (3895276343B/s), 1894382390B/s-2001294131B/s (1894382390B/s-2001294131B/s), io=19480276992B (19480276992B), run=5000-5001msec
Disk stats (read/write):
dm-3: ios=32768/3, merge=0/0, ticks=2873/13, in_queue=2894, util=31.40%
dm-0: ios=32768/6, merge=0/0, ticks=2841/16, in_queue=2863, util=30.63%
nvme0n1: ios=32768/6, merge=0/0, ticks=2597/15, in_queue=2604, util=29.92%
<bene-laptop>
[bengland@bene-laptop fio]$ ls -ltr | tail
...
-rw-rw-r--. 1 bengland bengland 24044 Dec 13 10:07 bentest_clat_hist.1.log.localhost
-rw-rw-r--. 1 bengland bengland 24015 Dec 13 10:07 bentest_clat_hist.2.log.localhost
Cross posted on to upstream fio github on issue related to this.
I've isolated it further. If you compile just iolog.c without optimization (and link without -O3 flag), but compile everything else in fio with default settings, it fixes the problem.
If you run the non-optimized iolog.c on just the fio --server end and run the fully optimized fio on the client side, it still works. So this narrows the code paths considerably.
Good news @atheurer @ndokos @hongkailiu - Jens Axboe put in a fix last night to upstream fio before I ever contacted him (he must be a precog). Here is the commit. and my test case now does not break, even with numjobs=60 log_hist_msec=500 runtime=60. So I think it's fixed, we just have to pull the new upstream fio into pbench-fio and make sure that it works properly with the pbench-fio script. I'll try that tomorrow and if there is no problem there, Nick can just pull in the upstream fio and we can start using that without any patch. It took 2 years to fix this.
Thanks for the update, Ben.
@ndokos @atheurer One change in upstream fiologparser_hist.py is that I think the new upstream fio measures latency in nanosec rather than microsec. So our pbench graph units may need to be adjusted. However, this is a good thing - NVDIMMs have latencies on the order of nanoseconds not microseconds, for example, so we need upstream fio to be able to measure this. I mentioned problems with measuring NVDIMM latency for older fio versions here. I think this is why the parameter FIO_IO_U_PLAT_GROUP_NR in stat.h changed from 19 to 29. 2^10 = 1024, which is roughly how much the time resolution increased when you switch from microseconds to nanoseconds. So if you update fio to upstream in the new pbench-fio RPM, be sure to also update any analysis tools like fiologparser_hist.py, I am not sure where that comes from in the pbench-fio script, is it here?
[bengland@bene-laptop postprocess]$ diff -u fio-postprocess-viz.p{y.orig,y}
--- fio-postprocess-viz.py.orig 2017-12-14 10:25:00.378259177 -0500
+++ fio-postprocess-viz.py 2017-12-14 10:26:17.442752572 -0500
@@ -20,7 +20,7 @@
<script src="/static/js/v0.3/saveSvgAsPng.js" charset="utf-8"></script>
<div id='jschart_latency'>
<script>
- create_jschart(0, "%s", "jschart_latency", "Percentiles", "Time (msec)", "Latency (usec)",
+ create_jschart(0, "%s", "jschart_latency", "Percentiles", "Time (msec)", "Latency (nsec)",
{ plotfiles: [ "avg.log", "median.log", "p90.log", "p95.log",
"p99.log", "min.log", "max.log" ],
sort_datasets: false, x_log_scale: false
For a description of fio latency histograms, see slides 18-21 in following attachment.
S103002-england-May-04-15-30-ceph-and-openstack-at-scale-v2.pdf
@ndokoso @hongkailiu
I installed pbench-agent-0.47-91gb666083.noarch using copr-pbench-test repo and tried this again.
The real problem in the original post was that --max-jobs=1 in fio command but numjobs=2 in fio.job file. So the command was self-contradictory.
So that's why it produced no log files. I found this out by running "bash -x fio.cmd" directly. The solution here is to not use numjobs, let pbench-fio decide what that should be.
One other unrelated bug I stumbled on: to make pbench-fio work for any value of write_hist_log parameter in fio job file, we need to wildcard the spot where this parameter value appears in the filename. So instead of fio_clat_hist..log, do _clat_hist..log* . But the workaround is to just use the value "fio" for this parameter, which it expects.
fiologparser_hist.py --job-file=$benchmark_results_dir/fio.job $benchmark_results_dir/clients/$client/*_clat_hist.*.log* >$benchmark_results_dir/clients/$client/hist/hist.csv
Once I fixed that, it postprocessed histogram logs but wouldn't process JSON output file, which was in fio-result.txt not result.json .
running fio job: /var/lib/pbench-agent/fio_RAND_IO_300s_2018.02.06T23.36.03/1-randrw-16KiB/sample1/fio.job ... fio-postprocess: could not find any result files to process, exiting Can't open /var/lib/pbench-agent/fio_RAND_IO_300s_2018.02.06T23.36.03/1-randrw-16KiB/sample1/result.json: No such file or directory at /opt/pbench-agent/bench-scripts/postprocess/process-iteration-samples line 174.
That's as far as I got.
Thanks, @bengland2.
Just to make sure that I understand correctly. We should NOT use numjobs=n in job config file, and neither as a parameter to pbench-fio. The pbench-fio command will handle this by itself.
Does pbench-fio treat numjobs always as 1? Or it is a dynamic value?
numjobs in the fio command line depends on the number of clients you are running. Also, what I was saying is that you can't specify a max_jobs parameter lower than numjobs parameter. I'd suggest removing max_jobs from your job file. Perhaps I am missing something but why was it there? Also see if you really need to specify numjobs or whether pbench-fio will generate that value for you.
Checked on our src of the job config: max_jobs is not there while numjobs=2 is there. Checked the params used for pbench-fio: max_jobs is not there and numjobs is not there either. This is the same as the output in the issue description above.
So when I encountered the issue, max_jobs is not used at all. The findings for the moment have not explained the issue yet. Probably I did not understand your comment in the first place.
Sorry if that wasn't clear, I was using the pbench-fio command from your initial post and just trying it out with the new pbench-fio. I'll try it again and post the exact inputs I used, see if that lines up with what you were doing.
On Wed, Feb 7, 2018 at 9:17 AM, Hongkai Liu notifications@github.com wrote:
Checked on our src of the job config: max_jobs is not there while numjobs=2 is there. Checked the params used for pbench-fio: max_jobs is not there and numjobs is not there either. This is the same as the output in the issue description above.
So when I encountered the issue, max_jobs is not used at all. The findings for the moment have not explained the issue yet. Probably I did not understand your comment in the first place.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/distributed-system-analysis/pbench/issues/734#issuecomment-363782115, or mute the thread https://github.com/notifications/unsubscribe-auth/ABozc8hfngPtSrxJNA9To52OqKOO_gN2ks5tSbBygaJpZM4Q_n0b .
could you post the fio jobfile you used in your initial post or as close as you can get to it? Unfortunately I can't exactly reproduce your input from what you provided, that may be the source of the confusion. Here's what I'm trying to do right now on my laptop, is this representative of the options you used?
su - -c "pbench-fio --max-failures=0 --sysinfo=none --test-types=randrw --clients=localhost --config=RAND_IO_30s --runtime=30 --samples=1 --max-stddev=20 --block-sizes=16 --file-size=64m --job-file=fio.job --target=ben"
With this job file, which I copied from /opt/pbench-agent/bench-scripts/templates/fio.job and then edited. All workload-related parameters are pushed into the job-$target section and all the configuration-related parameters are kept in the global section.
[global]
clocksource = clock_gettime
ramp_time = 5
write_bw_log = fio
write_iops_log = fio
write_lat_log = fio
log_avg_msec = 1000
write_hist_log = fio
log_hist_msec = 1000
# log_hist_coarseness = 4 # 76 bins
directory=//var/tmp/fio
[job-$target]
filename = $target
bs = $@
rw = $@
size = $@
rate_iops = 50
numjobs=4
runtime = 30
ioengine = libaio
iodepth = 32
direct = 1
sync = 0
time_based = 1
Right now this is failing to generate a job file in the pbench directory but pbench doesn't tell you about the problem, am trying to fix that first.
After changing the config file several time during the test, I am not sure about the exact content. It should be same/similar to the output in the description of the issue. We have the pbench-fio cases in the current run. Let me redo the case and come back to here again with the config file and the command.
Thank you very much, @bengland2
Hi @bengland2 , I replicated the issue today. There is the information:
https://paste.fedoraproject.org/paste/iGWX7dmZrRn81GD~5QwVtg
Another notification, the value of log_hist_msec created by pbench-fio is changed from 1000 in the job file to 10000.
Is this intended?
If numjobs=1, it works great. When numjobs=2, we have this: And then it starts to rerun ... Did I do something wrong?