IO500 / io500

IO500 Storage Benchmark source code
MIT License
95 stars 30 forks source link

aiori drivers initialized called without initializing MPI rank #59

Closed mchaarawi closed 1 year ago

mchaarawi commented 1 year ago

If one runs ior manually (outside of io500, the ior-main function sets the global rank variable here: https://github.com/hpc/ior/blob/main/src/ior.c#L205 before calling the init function for each backend driver (POSIX, DFS, etc). However; with io500 app, this rank global variable is not set until later (after the backend driver initialization is called). That means, for drivers like DFS, which take advantage of some collective operations for the initialization of the pool and container for the workflow, will see everyone as rank 0 and thus will have everyone do some expensive operation to connect to the pool and the container individually, vs 1 rank connect and share the handles.

We can get around this in the DFS driver with a small patch:

diff --git a/src/aiori-DFS.c b/src/aiori-DFS.c
index 23741e1..8c204e6 100755
--- a/src/aiori-DFS.c
+++ b/src/aiori-DFS.c
@@ -199,8 +199,10 @@ static int DFS_check_params(aiori_mod_opt_t * options){
         if (o->pool == NULL || o->cont == NULL)
                 ERR("Invalid pool or container options\n");

-        if (testComm == MPI_COMM_NULL)
+        if (testComm == MPI_COMM_NULL) {
                 testComm = MPI_COMM_WORLD;
+                MPI_CHECK(MPI_Comm_rank(testComm, &rank), "cannot get rank");
+        }

         return 0;
 }

or we can fix this in the io500 app with:

diff --git a/src/main.c b/src/main.c
index ad23285..389f507 100644
--- a/src/main.c
+++ b/src/main.c
@@ -24,6 +24,8 @@ static char const * io500_phase_str[IO500_SCORE_LAST] = {
   "MD",
   "BW"};

+extern int rank;
+
 static void prepare_aiori(void){
   // check selected API, might be followed by API options
   char * api = strdup(opt.api);
@@ -204,6 +206,7 @@ int main(int argc, char ** argv){
   MPI_Init(& argc, & argv);
   MPI_Comm_rank(MPI_COMM_WORLD, & opt.rank);
   MPI_Comm_size(MPI_COMM_WORLD, & opt.mpi_size);
+  rank = opt.rank;

   int verbosity_override = -1;
   int print_help = 0;

I pushed a PR for io500 with the latter: https://github.com/IO500/io500/pull/60 but if that is not acceptable, i can push the other one to ior/dfs.