eliminate unnecessary MPI_Allreduce calls in GMRES solver

GMRES() contains the following loop:

      const double av = normP(w, xkrylovlen);
      for (register int j = 0; j <= k; j++) {
        v = V[j];
        H[j][k] = dotP(w, v, xkrylovlen);
        addscale(-H[j][k], w, v, xkrylovlen);
      }

This is an inner loop within an outer loop where k runs from 0 to m-1 (the code chooses m=19). As a result, 210 calls are made to MPI_Allreduce.
In fact, w is orthogonal to all the vectors v, so the dot products can be reduced in parallel with a single call to MPI_Allreduce():

      for (register int j = 0; j <= k; j++) {
        y[j] = dot(w, V[j], xkrylovlen);
      }
      y[k+1] = norm2(w,xkrylovlen);
      MPI_Allreduce(MPI_IN_PLACE, y, (k+2),
        MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
      for (register int j = 0; j <= k; j++) {
        H[j][k] = y[j];
        addscale(-H[j][k], V[k+1], V[j], xkrylovlen);
      }
      double av = sqrt(y[k+1]);

This reduces the number of MPI_Allreduce calls per restart of the GMRES algorithm from 250 to 40.

CmPA / iPic3D

eliminate unnecessary MPI_Allreduce calls in GMRES solver #74