IO500 / io500

IO500 Storage Benchmark source code
MIT License
95 stars 30 forks source link

Error while running tests under Extended mode #25

Closed mrashid2 closed 3 years ago

mrashid2 commented 3 years ago

I am running IO-500 benchmark (https://github.com/VI4IO/io500.git) in extended mode. However, when it hits ior-rnd-read tests, it shows following error:

ERROR: the stoneWallingWearOut is only sensible when setting a stonewall deadline with -D, (ior.c:1450)
ERROR: the stoneWallingWearOut is only sensible when setting a stonewall deadline with -D, (ior.c:1450)
ERROR: the stoneWallingWearOut is only sensible when setting a stonewall deadline with -D, (ior.c:1450)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD 
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[node1.mrashid2-qv98443.dirr-pg0.wisc.cloudlab.us:27115] 3 more processes have sent help message help-mpi-api.txt / mpi-abort
[node1.mrashid2-qv98443.dirr-pg0.wisc.cloudlab.us:27115] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

In (ior.c:1450) file (https://github.com/hpc/ior/blob/8475c7d30025dd5e39147c251bf84e1ed24b9858/src/ior.c#L1449), following condition checking is defined:

        if (test->deadlineForStonewalling == 0 && test->stoneWallingWearOut > 0)
          ERR("the stoneWallingWearOut is only sensible when setting a stonewall deadline with -D");

Upon investigating further, it can be seen that the command to run ior-rnd-read phase is the following:

./ior -Q=1 -g -G=-1313584709 -z --random-offset-seed=11 -e -o=./datafiles/2021.05.10-01.26.15/ior-rnd/file -O stoneWallingStatusFile=./results/2021.05.10-01.26.15/ior-rnd.stonewall -O stoneWallingWearOut=1 -k -t=4096 -b=1073741824 -s=10000000 -r -R -a POSIX -O saveRankPerformanceDetailsCSV=./results/2021.05.10-01.26.15/ior-rnd-read.csv

From the command, it can be seen that even though stoneWallingWearOut is set to 1, there is no parameter that defines deadlineForStonewalling which causes the error to happen and the experiment to be aborted.

JulianKunkel commented 3 years ago

Thank you for your testing. This issue had been fixed in the testing branch for io500 which is also tagged for ISC: https://github.com/IO500/io500/releases/tag/io500-isc21 Please check!

mrashid2 commented 3 years ago

Thanks. The issue doesn't occur in that branch.