Closed lee218llnl closed 4 years ago
Also, when we set PSM2_KASSIST_MODE=auto, this small reproducer appears to run OK. We have asked our users to test this on a larger application to see if this env var works for them too.
You might want to pursue this through Intel support.
That said, you might want to check the value of errno after this error occurs. I suspect that this unusual use of permission bits is setting up a condition where the processes do not have permission to share their memory spaces with each other.
Looking at the code https://github.com/intel/opa-psm2/blob/be116611c7a7206bf056d7419313df3a3d137616/ptl_am/am_reqrep_shmem.c#L2263, it appears that setting PSM2_KASSIST_MODE=auto is no different than setting it to "none" or any other random string that is not "cma-put" or "cma-get". I may escalate this with Intel as you suggest.
Closing this because the root issue is the Linux security model, not PSM2.
Some of our users build executables for their group with the +x bit but with -r. This has been found to cause errors in the PSM2 layer:
[lee218@opal186:20200605_psm2]$ cat test.c
include "mpi.h"
include "stdio.h"
/ This does a transpose-cum-accumulate operation. Uses vector and hvector datatypes (Example 3.32 from MPI 1.1 Standard). Run on 2 processes /
define NROWS 100
define NCOLS 100
int main(int argc, char *argv[]) { int rank, nprocs, A[NROWS][NCOLS], i, j; MPI_Win win; MPI_Datatype column, xpose; int errs = 0;
MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); if (nprocs != 2) { printf("Run this program with 2 processes\n");fflush(stdout); MPI_Abort(MPI_COMM_WORLD,1); } if (rank == 0) { for (i=0; i<NROWS; i++) for (j=0; j<NCOLS; j++) A[i][j] = iNCOLS + j; / create datatype for one column / MPI_Type_vector(NROWS, 1, NCOLS, MPI_INT, &column); / create datatype for matrix in column-major order */ MPI_Type_hvector(NCOLS, 1, sizeof(int), column, &xpose); MPI_Type_commit(&xpose);
MPI_Win_create(NULL, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win); MPI_Win_fence(0, win); MPI_Accumulate(A, NROWS*NCOLS, MPI_INT, 1, 0, 1, xpose, MPI_SUM, win);
MPI_Type_free(&column); MPI_Type_free(&xpose); MPI_Win_fence(0, win); } else { / rank = 1 / for (i=0; i<NROWS; i++) for (j=0; j<NCOLS; j++) A[i][j] = iNCOLS + j; MPI_Win_create(A, NROWSNCOLSsizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win); MPI_Win_fence(0, win); MPI_Win_fence(0, win); for (j=0; j<NCOLS; j++) { for (i=0; i<NROWS; i++) { if (A[j][i] != iNCOLS + j + jNCOLS + i) { if (errs < 50) { printf("Error: A[%d][%d]=%d should be %d\n", j, i, A[j][i], iNCOLS + j + j*NCOLS + i);fflush(stdout); } errs++; } } } if (errs >= 50) { printf("Total number of errors: %d\n", errs);fflush(stdout); } } MPI_Win_free(&win); printf("Done %d: %d %d!\n", rank, A[0][0], A[NROWS-1][NCOLS-1]); MPI_Finalize(); return 0; }
[lee218@opal186:20200605_psm2]$ mpicc -g test.c [lee218@opal186:20200605_psm2]$ cp a.out /usr/global/tools/mpi [lee218@opal186:20200605_psm2]$ chmod g+x,o+x /usr/global/tools/mpi/a.out
And here's what happens when run by another user:
[testnewm@opal68 ~]$ srun -n 2 /usr/global/tools/mpi/a.out opal68.60650a.out: Reading from remote process' memory failed. Disabling CMA support opal68.60650Assertion failure at /collab/usr/global/tools/mpi/psm2/opa-psm2-PSM2_11.2.173/ptl_am/ptl.c:153: nbytes == req->req_data.recv_msglen [opal68:mpi_rank_1][error_sighandler] Caught error: Aborted (signal 6)
We are aware of a workaround, but it would be preferred if we didn't have to ask our users to set these:
[testnewm@opal68 ~]$ setenv MV2_SMP_USE_CMA 0 [testnewm@opal68 ~]$ setenv PSM2_KASSIST_MODE none [testnewm@opal68 ~]$ srun -n 2 /usr/global/tools/mpi/a.out Done 0: 0 9999! Done 1: 0 19998!