Closed mrashid2 closed 3 years ago
Thanks for bug reporting. As we will release the new version soon, I won't backport the fix. I believe the following will resolve the issue (and will be part of the new release).
diff --git a/src/main.c b/src/main.c
index 93c2dca..dd53b42 100644
--- a/src/main.c
+++ b/src/main.c
@@ -139,8 +139,12 @@ static double calc_score(double scores[IO500_SCORE_LAST], int extended){
p += sprintf(p, "%.8f", t);
}
}
- DEBUG_INFO("%s)^%f\n", score_string, 1.0/numbers);
- score = pow(score, 1.0/numbers);
+ if(numbers == 0){
+ score = 0;
+ }else{
+ DEBUG_INFO("%s)^%f\n", score_string, 1.0/numbers);
+ score = pow(score, 1.0/numbers);
+ }
scores[g] = score;
PRINT_PAIR(io500_phase_str[g], "%f\n", score);
The issue seems not to be producing anymore. Thanks for looking into the issue. It seems that https://github.com/IO500/io500 is not the most the up-to-date repo. However, in the IO500 project page (https://www.vi4io.org/std/io500/start#communication_contribution), it points to the IO500/io500 repo as the source code. Whereas one can get the most up-to-date code from VI4IO/io500. In my opinion, this is kind of inconsistent.
One request, if you have the most updated summary of what the new phase of the benchmark do (i.e. ior-rnd, mdworkbench), please share with me.
I also ran into the "single-phase, segfault" issue, and tracked it slightly further. For me, it was exhibiting classic memory corruption symptoms - sometimes segfaulting, sometimes hanging, etc.
After running through valgind, I discovered that logging is causing a call to fprintf()
and passing a stale file pointer.
Basically, in this case, ior-easy runs and creates a log file. It passes the FILE*
for this log file to ior_run()
, which sets a global variable out_logfile
within IOR to this file pointer.
After ior_run()
finishes, ior-easy
then closes that file handle when it is done, but IOR still has a copy of the FILE*
in it's out_logfile
global variable.
During cleanup, all phases get their cleanup called.
During ior-hard's cleanup, it tries to delete files which were never created (because the phase never ran).
This causes IOR's POSIX_Delete()
method to attempt to print a warning to the out_logfile
file handle, which is no longer valid, causing the memory corruption.
Below is some valgrind output:
==23302== Invalid read of size 4
==23302== at 0x87BB2BB: __fprintf_chk (fprintf_chk.c:30)
==23302== by 0x17F33D: fprintf (stdio2.h:97)
==23302== by 0x17F33D: POSIX_Delete (aiori-POSIX.c:645)
==23302== by 0x1109ED: u_purge_file (util.c:68)
==23302== by 0x114D62: cleanup (phase_ior_hard.c:39)
==23302== by 0x1105DB: main (main.c:414)
==23302== Address 0xbaeb120 is 0 bytes inside a block of size 552 free'd
==23302== at 0x4C32F78: free (vg_replace_malloc.c:530)
==23302== by 0x87073DD: fclose@@GLIBC_2.2.5 (iofclose.c:77)
==23302== by 0x111506: u_res_file_close (util.c:244)
==23302== by 0x114930: ior_process_read (phase_ior.c:31)
==23302== by 0x114610: run (phase_ior_easy_read.c:50)
==23302== by 0x10FBF2: main (main.c:333)
==23302== Block was alloc'd at
==23302== at 0x4C31DCB: malloc (vg_replace_malloc.c:299)
==23302== by 0x8707F29: __fopen_internal (iofopen.c:65)
==23302== by 0x8707F29: fopen@@GLIBC_2.2.5 (iofopen.c:89)
==23302== by 0x11146A: u_res_file_prep (util.c:234)
==23302== by 0x1145F2: run (phase_ior_easy_read.c:49)
==23302== by 0x10FBF2: main (main.c:333)
Hi, it appears you are running the io500-sc20 branch (or older), as this would fit the line numbers you are reporting. The bug you mentioned had been fixed in the current main. Is there a reason to run the old code?
My apologies, you are correct. It does appear to be fixed from the main branch.
I'm using the io500-sc20 branch, as that is the branch specifically called out to use from the io500 website's Submission Rules page: https://io500.org/rules-submission. The prose and the example line are in conflict.
Submissions are made using the latest version of the IO500 application in GitHub and all binaries should be built according to the included build instructions.
$ git clone https://github.com/IO500/io500.git -b io500-sc20
I am facing this error in the scenario where I tried to run only a single phase of the test. For example, when I tried to run only ior-easy phase(both read and write) what I did is that for other phases of the benchmark test except ior-easy, I have set the noRun variables to TRUE. After executing the command (stonewall-time set to 90 seconds)to run the benchmark, I got the following error:
After setting the verbosity level to 10 in the configuration, following logs were reported: