pbench-fio complains when numjobs=2

hongkailiu commented 6 years ago

If numjobs=1, it works great. When numjobs=2, we have this: And then it starts to rerun ... Did I do something wrong?

pbench-fio --test-types=randrw --clients=172.23.2.5 --config=RAND_IO_300s --samples=1 --max-stddev=20 --block-sizes=16 --job-file=config/random_io.job --pre-iteration-script=/root/svt/storage/scripts/drop-cache.sh
...
The following jobfile was created: /var/lib/pbench-agent/fio_RAND_IO_300s_2017.12.12T18.52.11/1-randrw-16KiB/sample1/fio.job
[global]
ioengine=libaio
iodepth=2
direct=1
sync=1
fsync_on_close=1
time_based=1
runtime=300
clocksource=clock_gettime
ramp_time=15
startdelay=5
directory=/var/lib/fio
filename_format=test.$jobname.$jobnum.$filenum
size=1g
write_bw_log=fio
write_iops_log=fio
write_lat_log=fio
write_hist_log=fio
per_job_logs=1
log_avg_msec=1000
log_hist_msec=1000

[fio-job]
bs=16k
rw=randrw
nrfiles=2
numjobs=2

running fio job: /var/lib/pbench-agent/fio_RAND_IO_300s_2017.12.12T18.52.11/1-randrw-16KiB/sample1/fio.job
bash: firewall-cmd: command not found
running pre-iteration-script command: /root/svt/storage/scripts/drop-cache.sh
/bin/mv: cannot stat ‘/var/lib/pbench-agent/fio_RAND_IO_300s_2017.12.12T18.52.11/1-randrw-16KiB/sample1/*.log.172.23.2.5’: No such file or directory
Traceback (most recent call last):
  File "/usr/local/bin/fiologparser_hist.py", line 387, in <module>
    main(p.parse_args())
  File "/usr/local/bin/fiologparser_hist.py", line 284, in main
    with open(ctx.FILE[0], 'r') as fp:
IOError: [Errno 2] No such file or directory: '/var/lib/pbench-agent/fio_RAND_IO_300s_2017.12.12T18.52.11/1-randrw-16KiB/sample1/clients/172.23.2.5/fio_clat_hist.*.log*'
Chart Type: xy
^C^C^C^Cfio job complete

ndokos commented 6 years ago

There seem to be multiple problems, but one problem might be a race in the production of the latency files: they might need to be labeled by job id to avoid multiple jobs stomping on each other. @bengland2: does that sound plausible?

bengland2 commented 6 years ago

@ndokos @hongkailiu @atheurer no it doesn't seem right. The error was that there were zero histogram logs. If the multiple jobs stomped on each other then there would at least be 1. I'll try to talk to you today but Tim Wilkinson noticed some odd things about new pbench-fio version as well, as did I. Bear with me while I explain.

When I don't run pbench-fio RPM's fio in distributed mode, it works.

I installed pbench-fio-2.14-2 on my laptop and ran this command:

fio --name=randread bentest.job

Where bentest.fio job file contained:

[global]
time_based=1
runtime=5
write_hist_log=bentest
log_hist_msec=1000
directory=/home/bengland/ceph/fio/pbench-fio-2.14-2/run
filesize=64m

[randread]
rw=randread
bs=4k
numjobs=2

It generated these log files:

bentest_clat_hist.1.log  bentest_clat_hist.2.log

All well and good. But then I try running the same job file with the --client option:

fio --client=localhost --name=randread bentest.job

And I get this output:

randread: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio: job file <../bentest.job> open: No such file or directory
[bengland@bene-laptop pbench-fio-2.14-2]$ fio --client=localhost --name=randread bentest.job
randread: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.14
<localhost> net      8324  server: got op [PROBE], pdu=136, tag=1128680
<localhost> net      8324  server: sending probe reply
hostname=bene-laptop, be=0, 64-bit, os=Linux, arch=x86-64, fio=fio-2.14, flags=1
<bene-laptop> net      8324  server: got op [JOB], pdu=192, tag=0
<bene-laptop> randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
<bene-laptop> ...
<bene-laptop> net      8324  server: got op [RUN], pdu=0, tag=0
<bene-laptop> net      8324  server: forked off job job (pid=8333)
<bene-laptop> Starting 2 processes
<bene-laptop> net      8324  server: got op [SEND_ETA], pdu=0, tag=1128680
<bene-laptop> net      8324  server sending status
<bene-laptop> net      8324  server: got op [SEND_ETA], pdu=0, tag=1128680
<bene-laptop> net      8324  server sending status
<bene-laptop> net      8324  server: got op [SEND_ETA], pdu=0, tag=1128680
<bene-laptop> net      8324  server sending status
<bene-laptop> net      8324  server: got op [SEND_ETA], pdu=0, tag=112868000m:02s]
<bene-laptop> net      8324  server sending status
<bene-laptop> net      8324  server: got op [SEND_ETA], pdu=0, tag=112868000m:01s]
<bene-laptop> net      8324  server sending status
Segmentation fault (core dumped)ne] [9850M/0K/0K /s] [2463K/0/0 iops] [eta 00m:00s]

So I wanted to look at this with gdb to see what was wrong, so I recompiled fio by disabling the -O3 option in Makefile (I commented out the _FORTIFY_SOURCE flag too because the compile complained about it). When I re-run it, it works and both log files are returned by pbench-fio. Ughh, what do I do now? I guess I can try same experiment with upstream fio and if that fails, then I can complain to upstream developers at least.

back to the optimized version -- when I run the optimized version with numjobs=1, I see a zero-length histogram log file. So this might explain above observation that with 1 job per host it at least didn't get a "no such file" error.

Just want to point out that when Tim and I used fio histogram logging successfully at large scale, we had to patch fio to NOT have fio --server processes copy logs back to the fio --client process, by disabling the line 1093 in iolog.c:

#if 0
        if (td->client_type == FIO_CLIENT_TYPE_GUI || is_backend)
                fio_send_iolog(td, log, log->filename);
        else
                flush_log(log, !td->o.per_job_logs);
#else
        flush_log(log, !td->o.per_job_logs);
#endif

and changing the pbench-fio bash script like this:

--- /opt/pbench-agent/bench-scripts/pbench-fio  2016-09-26 14:13:06.547000000 +0000
+++ pbench-fio.ben  2016-12-02 03:06:42.107448525 +0000
@@ -403,6 +403,7 @@
            ssh $ssh_opts $client ${pbench_install_dir}/bench-scripts/$script_name --remote-only --install &
        done
        wait
+       which pdsh || (debug_log "[$script_name] pdsh is not installed on test driver, exiting" ; exit 1)
    fi
 }

@@ -587,7 +588,9 @@
    chmod +x $benchmark_results_dir/fio.cmd
    debug_log "$benchmark: Going to run [$bench_cmd $bench_opts $client_opts]"
    pushd $benchmark_results_dir >/dev/null
+   pdsh -R ssh -S -w ^$client_file 'rm -f $benchmark_results_dir/*.log'
    $benchmark_results_dir/fio.cmd >$benchmark_results_dir/fio-result.txt
+   rpdcp -R ssh -w ^$client_file $benchmark_results_dir/*.log $benchmark_results_dir
    popd >/dev/null
    pbench-stop-tools --group=$tool_group --iteration=$iteration --dir=$benchmark_results_dir
    if [ ! -z "$clients" ]; then
@@ -602,8 +605,6 @@
            ssh $ssh_opts $client /bin/rm -f "$benchmark_results_dir/*.log"  &
        done
        wait
-   else
-       mv $benchmark_results_dir/fio*.log $benchmark_results_dir/clients/localhost/ 2>/dev/null
    fi
    pbench-postprocess-tools --group=$tool_group --iteration=$iteration --dir=$benchmark_results_dir
    echo "fio job complete"

bengland2 commented 6 years ago

upstream fio behaves the same way - with default compilation options it segfaults, but when you disable optimization then it works! With defaults:

[bengland@bene-laptop fio]$ ./fio --client=localhost --name=randread ../../pbench-fio-2.14-2/bentest.job
randread: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.2-24-gf273
hostname=bene-laptop, be=0, 64-bit, os=Linux, arch=x86-64, fio=fio-3.2-24-gf273, flags=1
<bene-laptop> randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
<bene-laptop> ...
<bene-laptop> Starting 2 processes
Segmentation fault (core dumped)9002M,w=0k][r=2251k,w=0 IOPS][eta 00m:00s]

With optimization turned off:

[bengland@bene-laptop fio]$ ./fio --client=localhost --name=randread ../../pbench-fio-2.14-2/bentest.job
randread: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.2-24-gf273-dirty
hostname=bene-laptop, be=0, 64-bit, os=Linux, arch=x86-64, fio=fio-3.2-24-gf273-dirty, flags=1
<bene-laptop> randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
<bene-laptop> ...
<bene-laptop> Starting 2 processes
randread: (groupid=0, jobs=1): err= 0: pid=25115: Wed Dec 13 10:07:40 2017
   read: IOPS=462k, BW=1807MiB/s (1894MB/s)(9035MiB/5001msec)
    clat (nsec): min=338, max=5965.0k, avg=1371.41, stdev=8798.85
     lat (nsec): min=381, max=5967.3k, avg=1431.37, stdev=8806.48
    clat percentiles (nsec):
     |  1.00th=[   516],  5.00th=[   540], 10.00th=[   556], 20.00th=[   588],
     | 30.00th=[   604], 40.00th=[   620], 50.00th=[   644], 60.00th=[   676],
     | 70.00th=[   740], 80.00th=[   796], 90.00th=[   868], 95.00th=[   996],
     | 99.00th=[  3632], 99.50th=[ 82432], 99.90th=[108032], 99.95th=[117248],
     | 99.99th=[136192]
   bw (  MiB/s): min=   39, max= 2780, per=45.38%, avg=1685.63, stdev=1244.67, samples=9
   iops        : min=10040, max=711898, avg=431520.67, stdev=318634.60, samples=9
  lat (nsec)   : 500=0.24%, 750=71.06%, 1000=23.79%
  lat (usec)   : 2=3.29%, 4=0.73%, 10=0.16%, 20=0.03%, 50=0.01%
  lat (usec)   : 100=0.52%, 250=0.18%, 500=0.01%
  lat (msec)   : 10=0.01%
  cpu          : usr=42.68%, sys=28.94%, ctx=16393, majf=0, minf=20
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=2312941,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
randread: (groupid=0, jobs=1): err= 0: pid=25116: Wed Dec 13 10:07:40 2017
   read: IOPS=489k, BW=1909MiB/s (2001MB/s)(9543MiB/5000msec)
    clat (nsec): min=334, max=1117.1k, avg=1303.81, stdev=7677.98
     lat (nsec): min=383, max=1118.3k, avg=1358.97, stdev=7686.19
    clat percentiles (nsec):
     |  1.00th=[   510],  5.00th=[   524], 10.00th=[   540], 20.00th=[   572],
     | 30.00th=[   588], 40.00th=[   604], 50.00th=[   620], 60.00th=[   636],
     | 70.00th=[   668], 80.00th=[   748], 90.00th=[   860], 95.00th=[   940],
     | 99.00th=[  3344], 99.50th=[ 82432], 99.90th=[107008], 99.95th=[115200],
     | 99.99th=[136192]
   bw (  MiB/s): min=   39, max= 2990, per=49.58%, avg=1841.99, stdev=1367.21, samples=9
   iops        : min=10054, max=765588, avg=471550.44, stdev=350004.44, samples=9
  lat (nsec)   : 500=0.37%, 750=79.56%, 1000=16.42%
  lat (usec)   : 2=2.09%, 4=0.72%, 10=0.13%, 20=0.03%, 50=0.01%
  lat (usec)   : 100=0.50%, 250=0.17%
  lat (msec)   : 2=0.01%
  cpu          : usr=41.50%, sys=30.30%, ctx=16421, majf=0, minf=19
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=2442986,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
All clients: (groupid=0, jobs=2): err= 0: pid=0: Wed Dec 13 10:07:40 2017
   read: IOPS=0, BW=0 (0)(0B/5001msec)
    clat (nsec): min=334, max=5965.0k, avg=1303.81, stdev=5502.88
     lat (nsec): min=381, max=5967.3k, avg=1358.97, stdev=5508.76
   bw (  MiB/s): min=   39, max= 2990, per=24.79%, avg=1841.99, stdev=937.90, samples=18
   iops        : min=10040, max=765588, avg=471550.44, stdev=240101.07, samples=18
  lat (nsec)   : 500=0.31%, 750=75.43%, 1000=20.00%
  lat (usec)   : 2=2.67%, 4=0.72%, 10=0.15%, 20=0.03%, 50=0.01%
  lat (usec)   : 100=0.51%, 250=0.18%, 500=0.01%
  lat (msec)   : 2=0.01%, 10=0.01%
  cpu          : usr=42.09%, sys=29.62%, ctx=32814, majf=0, minf=39
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=4755927,0,0, short=0,0,0, dropped=0,0,0

Run status group 0 (all jobs):
   READ: bw=3895276343B/s (3895276343B/s), 1894382390B/s-2001294131B/s (1894382390B/s-2001294131B/s), io=19480276992B (19480276992B), run=5000-5001msec

Disk stats (read/write):
  dm-3: ios=32768/3, merge=0/0, ticks=2873/13, in_queue=2894, util=31.40%
  dm-0: ios=32768/6, merge=0/0, ticks=2841/16, in_queue=2863, util=30.63%
  nvme0n1: ios=32768/6, merge=0/0, ticks=2597/15, in_queue=2604, util=29.92%
<bene-laptop> 

[bengland@bene-laptop fio]$ ls -ltr | tail
...
-rw-rw-r--. 1 bengland bengland   24044 Dec 13 10:07 bentest_clat_hist.1.log.localhost
-rw-rw-r--. 1 bengland bengland   24015 Dec 13 10:07 bentest_clat_hist.2.log.localhost

bengland2 commented 6 years ago

Cross posted on to upstream fio github on issue related to this.

bengland2 commented 6 years ago

I've isolated it further. If you compile just iolog.c without optimization (and link without -O3 flag), but compile everything else in fio with default settings, it fixes the problem.

bengland2 commented 6 years ago

If you run the non-optimized iolog.c on just the fio --server end and run the fully optimized fio on the client side, it still works. So this narrows the code paths considerably.

bengland2 commented 6 years ago

Good news @atheurer @ndokos @hongkailiu - Jens Axboe put in a fix last night to upstream fio before I ever contacted him (he must be a precog). Here is the commit. and my test case now does not break, even with numjobs=60 log_hist_msec=500 runtime=60. So I think it's fixed, we just have to pull the new upstream fio into pbench-fio and make sure that it works properly with the pbench-fio script. I'll try that tomorrow and if there is no problem there, Nick can just pull in the upstream fio and we can start using that without any patch. It took 2 years to fix this.

hongkailiu commented 6 years ago

Thanks for the update, Ben.

bengland2 commented 6 years ago

@ndokos @atheurer One change in upstream fiologparser_hist.py is that I think the new upstream fio measures latency in nanosec rather than microsec. So our pbench graph units may need to be adjusted. However, this is a good thing - NVDIMMs have latencies on the order of nanoseconds not microseconds, for example, so we need upstream fio to be able to measure this. I mentioned problems with measuring NVDIMM latency for older fio versions here. I think this is why the parameter FIO_IO_U_PLAT_GROUP_NR in stat.h changed from 19 to 29. 2^10 = 1024, which is roughly how much the time resolution increased when you switch from microseconds to nanoseconds. So if you update fio to upstream in the new pbench-fio RPM, be sure to also update any analysis tools like fiologparser_hist.py, I am not sure where that comes from in the pbench-fio script, is it here?

[bengland@bene-laptop postprocess]$ diff -u fio-postprocess-viz.p{y.orig,y}
--- fio-postprocess-viz.py.orig 2017-12-14 10:25:00.378259177 -0500
+++ fio-postprocess-viz.py      2017-12-14 10:26:17.442752572 -0500
@@ -20,7 +20,7 @@
 <script src="/static/js/v0.3/saveSvgAsPng.js" charset="utf-8"></script>
 <div id='jschart_latency'>
   <script>
-    create_jschart(0, "%s", "jschart_latency", "Percentiles", "Time (msec)", "Latency (usec)",
+    create_jschart(0, "%s", "jschart_latency", "Percentiles", "Time (msec)", "Latency (nsec)",
         { plotfiles: [ "avg.log", "median.log", "p90.log", "p95.log",
                        "p99.log", "min.log", "max.log" ],
           sort_datasets: false, x_log_scale: false

For a description of fio latency histograms, see slides 18-21 in following attachment.

S103002-england-May-04-15-30-ceph-and-openstack-at-scale-v2.pdf

bengland2 commented 6 years ago

@ndokoso @hongkailiu

I installed pbench-agent-0.47-91gb666083.noarch using copr-pbench-test repo and tried this again.

The real problem in the original post was that --max-jobs=1 in fio command but numjobs=2 in fio.job file. So the command was self-contradictory.

/usr/local/bin/fio --client=localhost --max-jobs=1 --max-failures=0 --output-format=json /var/lib/pbench-agent/fio_RAND_IO_300s_2018.02.06T23.00.22/1-randrw-16KiB/sample1/fio.job hostname=bene-laptop, be=0, 64-bit, os=Linux, arch=x86-64, fio=fio-3.3, flags=1 error: maximum number of jobs (1) reached.

So that's why it produced no log files. I found this out by running "bash -x fio.cmd" directly. The solution here is to not use numjobs, let pbench-fio decide what that should be.

One other unrelated bug I stumbled on: to make pbench-fio work for any value of write_hist_log parameter in fio job file, we need to wildcard the spot where this parameter value appears in the filename. So instead of fio_clat_hist..log, do _clat_hist..log* . But the workaround is to just use the value "fio" for this parameter, which it expects.

[bengland@bene-laptop fio]$ diff /opt/pbench-agent/bench-scripts/pbench-fi{o.orig,o} 627c627 < fiologparser_hist.py --job-file=$benchmark_results_dir/fio.job $benchmark_results_dir/clients/$client/fio_clat_hist..log >$benchmark_results_dir/clients/$client/hist/hist.csv

                          fiologparser_hist.py --job-file=$benchmark_results_dir/fio.job $benchmark_results_dir/clients/$client/*_clat_hist.*.log* >$benchmark_results_dir/clients/$client/hist/hist.csv

Once I fixed that, it postprocessed histogram logs but wouldn't process JSON output file, which was in fio-result.txt not result.json .

running fio job: /var/lib/pbench-agent/fio_RAND_IO_300s_2018.02.06T23.36.03/1-randrw-16KiB/sample1/fio.job ... fio-postprocess: could not find any result files to process, exiting Can't open /var/lib/pbench-agent/fio_RAND_IO_300s_2018.02.06T23.36.03/1-randrw-16KiB/sample1/result.json: No such file or directory at /opt/pbench-agent/bench-scripts/postprocess/process-iteration-samples line 174.

That's as far as I got.

hongkailiu commented 6 years ago

Thanks, @bengland2.

Just to make sure that I understand correctly. We should NOT use numjobs=n in job config file, and neither as a parameter to pbench-fio. The pbench-fio command will handle this by itself.

Does pbench-fio treat numjobs always as 1? Or it is a dynamic value?

bengland2 commented 6 years ago

numjobs in the fio command line depends on the number of clients you are running. Also, what I was saying is that you can't specify a max_jobs parameter lower than numjobs parameter. I'd suggest removing max_jobs from your job file. Perhaps I am missing something but why was it there? Also see if you really need to specify numjobs or whether pbench-fio will generate that value for you.

hongkailiu commented 6 years ago

Checked on our src of the job config: max_jobs is not there while numjobs=2 is there. Checked the params used for pbench-fio: max_jobs is not there and numjobs is not there either. This is the same as the output in the issue description above.

So when I encountered the issue, max_jobs is not used at all. The findings for the moment have not explained the issue yet. Probably I did not understand your comment in the first place.

bengland2 commented 6 years ago

Sorry if that wasn't clear, I was using the pbench-fio command from your initial post and just trying it out with the new pbench-fio. I'll try it again and post the exact inputs I used, see if that lines up with what you were doing.

On Wed, Feb 7, 2018 at 9:17 AM, Hongkai Liu notifications@github.com wrote:

Checked on our src of the job config: max_jobs is not there while numjobs=2 is there. Checked the params used for pbench-fio: max_jobs is not there and numjobs is not there either. This is the same as the output in the issue description above.

So when I encountered the issue, max_jobs is not used at all. The findings for the moment have not explained the issue yet. Probably I did not understand your comment in the first place.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/distributed-system-analysis/pbench/issues/734#issuecomment-363782115, or mute the thread https://github.com/notifications/unsubscribe-auth/ABozc8hfngPtSrxJNA9To52OqKOO_gN2ks5tSbBygaJpZM4Q_n0b .

bengland2 commented 6 years ago

could you post the fio jobfile you used in your initial post or as close as you can get to it? Unfortunately I can't exactly reproduce your input from what you provided, that may be the source of the confusion. Here's what I'm trying to do right now on my laptop, is this representative of the options you used?

su - -c "pbench-fio --max-failures=0 --sysinfo=none --test-types=randrw --clients=localhost --config=RAND_IO_30s --runtime=30  --samples=1 --max-stddev=20 --block-sizes=16 --file-size=64m  --job-file=fio.job --target=ben"

With this job file, which I copied from /opt/pbench-agent/bench-scripts/templates/fio.job and then edited. All workload-related parameters are pushed into the job-$target section and all the configuration-related parameters are kept in the global section.

[global]
clocksource = clock_gettime
ramp_time = 5
write_bw_log = fio
write_iops_log = fio
write_lat_log = fio
log_avg_msec = 1000
write_hist_log = fio
log_hist_msec = 1000
# log_hist_coarseness = 4 # 76 bins
directory=//var/tmp/fio

[job-$target]
filename = $target
bs = $@
rw = $@
size = $@
rate_iops = 50
numjobs=4
runtime = 30
ioengine = libaio
iodepth = 32
direct = 1
sync = 0
time_based = 1

Right now this is failing to generate a job file in the pbench directory but pbench doesn't tell you about the problem, am trying to fix that first.

hongkailiu commented 6 years ago

After changing the config file several time during the test, I am not sure about the exact content. It should be same/similar to the output in the description of the issue. We have the pbench-fio cases in the current run. Let me redo the case and come back to here again with the config file and the command.

Thank you very much, @bengland2

hongkailiu commented 6 years ago

Hi @bengland2 , I replicated the issue today. There is the information:

https://paste.fedoraproject.org/paste/iGWX7dmZrRn81GD~5QwVtg

Another notification, the value of log_hist_msec created by pbench-fio is changed from 1000 in the job file to 10000.

Is this intended?

distributed-system-analysis / pbench

pbench-fio complains when numjobs=2 #734

[bengland@bene-laptop fio]$ diff /opt/pbench-agent/bench-scripts/pbench-fi{o.orig,o} 627c627 < fiologparser_hist.py --job-file=$benchmark_results_dir/fio.job $benchmark_results_dir/clients/$client/fio_clat_hist..log >$benchmark_results_dir/clients/$client/hist/hist.csv