const double av = normP(w, xkrylovlen);
for (register int j = 0; j <= k; j++) {
v = V[j];
H[j][k] = dotP(w, v, xkrylovlen);
addscale(-H[j][k], w, v, xkrylovlen);
}
This is an inner loop within an outer loop where k runs from 0 to m-1 (the code chooses m=19).
As a result, 210 calls are made to MPI_Allreduce.
In fact, w is orthogonal to all the vectors v, so the dot products can be reduced in parallel with a single call to MPI_Allreduce():
for (register int j = 0; j <= k; j++) {
y[j] = dot(w, V[j], xkrylovlen);
}
y[k+1] = norm2(w,xkrylovlen);
MPI_Allreduce(MPI_IN_PLACE, y, (k+2),
MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
for (register int j = 0; j <= k; j++) {
H[j][k] = y[j];
addscale(-H[j][k], V[k+1], V[j], xkrylovlen);
}
double av = sqrt(y[k+1]);
This reduces the number of MPI_Allreduce calls per restart of the GMRES algorithm from 250 to 40.
GMRES()
contains the following loop:This is an inner loop within an outer loop where k runs from 0 to m-1 (the code chooses m=19). As a result, 210 calls are made to MPI_Allreduce.
In fact, w is orthogonal to all the vectors v, so the dot products can be reduced in parallel with a single call to MPI_Allreduce():
This reduces the number of MPI_Allreduce calls per restart of the GMRES algorithm from 250 to 40.