cornelisnetworks / opa-psm2

Other
36 stars 29 forks source link

CMA fails for group executable without group read #54

Closed lee218llnl closed 4 years ago

lee218llnl commented 4 years ago

Some of our users build executables for their group with the +x bit but with -r. This has been found to cause errors in the PSM2 layer:

[lee218@opal186:20200605_psm2]$ cat test.c

include "mpi.h"

include "stdio.h"

/ This does a transpose-cum-accumulate operation. Uses vector and hvector datatypes (Example 3.32 from MPI 1.1 Standard). Run on 2 processes /

define NROWS 100

define NCOLS 100

int main(int argc, char *argv[]) { int rank, nprocs, A[NROWS][NCOLS], i, j; MPI_Win win; MPI_Datatype column, xpose; int errs = 0;

MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); if (nprocs != 2) { printf("Run this program with 2 processes\n");fflush(stdout); MPI_Abort(MPI_COMM_WORLD,1); } if (rank == 0) { for (i=0; i<NROWS; i++) for (j=0; j<NCOLS; j++) A[i][j] = iNCOLS + j; / create datatype for one column / MPI_Type_vector(NROWS, 1, NCOLS, MPI_INT, &column); / create datatype for matrix in column-major order */ MPI_Type_hvector(NCOLS, 1, sizeof(int), column, &xpose); MPI_Type_commit(&xpose);

MPI_Win_create(NULL, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win); MPI_Win_fence(0, win); MPI_Accumulate(A, NROWS*NCOLS, MPI_INT, 1, 0, 1, xpose, MPI_SUM, win);

MPI_Type_free(&column); MPI_Type_free(&xpose); MPI_Win_fence(0, win); } else { / rank = 1 / for (i=0; i<NROWS; i++) for (j=0; j<NCOLS; j++) A[i][j] = iNCOLS + j; MPI_Win_create(A, NROWSNCOLSsizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win); MPI_Win_fence(0, win); MPI_Win_fence(0, win); for (j=0; j<NCOLS; j++) { for (i=0; i<NROWS; i++) { if (A[j][i] != iNCOLS + j + jNCOLS + i) { if (errs < 50) { printf("Error: A[%d][%d]=%d should be %d\n", j, i, A[j][i], iNCOLS + j + j*NCOLS + i);fflush(stdout); } errs++; } } } if (errs >= 50) { printf("Total number of errors: %d\n", errs);fflush(stdout); } } MPI_Win_free(&win); printf("Done %d: %d %d!\n", rank, A[0][0], A[NROWS-1][NCOLS-1]); MPI_Finalize(); return 0; }

[lee218@opal186:20200605_psm2]$ mpicc -g test.c [lee218@opal186:20200605_psm2]$ cp a.out /usr/global/tools/mpi [lee218@opal186:20200605_psm2]$ chmod g+x,o+x /usr/global/tools/mpi/a.out

And here's what happens when run by another user:

[testnewm@opal68 ~]$ srun -n 2 /usr/global/tools/mpi/a.out opal68.60650a.out: Reading from remote process' memory failed. Disabling CMA support opal68.60650Assertion failure at /collab/usr/global/tools/mpi/psm2/opa-psm2-PSM2_11.2.173/ptl_am/ptl.c:153: nbytes == req->req_data.recv_msglen [opal68:mpi_rank_1][error_sighandler] Caught error: Aborted (signal 6)

We are aware of a workaround, but it would be preferred if we didn't have to ask our users to set these:

[testnewm@opal68 ~]$ setenv MV2_SMP_USE_CMA 0 [testnewm@opal68 ~]$ setenv PSM2_KASSIST_MODE none [testnewm@opal68 ~]$ srun -n 2 /usr/global/tools/mpi/a.out Done 0: 0 9999! Done 1: 0 19998!

lee218llnl commented 4 years ago

Also, when we set PSM2_KASSIST_MODE=auto, this small reproducer appears to run OK. We have asked our users to test this on a larger application to see if this env var works for them too.

mwheinz commented 4 years ago

You might want to pursue this through Intel support.

That said, you might want to check the value of errno after this error occurs. I suspect that this unusual use of permission bits is setting up a condition where the processes do not have permission to share their memory spaces with each other.

lee218llnl commented 4 years ago

Looking at the code https://github.com/intel/opa-psm2/blob/be116611c7a7206bf056d7419313df3a3d137616/ptl_am/am_reqrep_shmem.c#L2263, it appears that setting PSM2_KASSIST_MODE=auto is no different than setting it to "none" or any other random string that is not "cma-put" or "cma-get". I may escalate this with Intel as you suggest.

mwheinz commented 4 years ago

Closing this because the root issue is the Linux security model, not PSM2.