cea-hpc / wi4mpi

Wrapper interface for MPI
BSD 3-Clause "New" or "Revised" License
80 stars 15 forks source link

Invalid rank in sendrecv communication with MPI_Cart_shift #21

Closed marcjoos-cea closed 11 months ago

marcjoos-cea commented 2 years ago

When source and destination ranks are obtained with MPI_Cart_shift, MPI_Sendrecv communications fail if going between two different MPI implementation (for example from Open MPI to MPICH). There is no issue using different versions of a given implementation (for example from Open MPI 2 to Open MPI 4).

If fails with the following error stack (from Open MPI to Intel MPI):

Fatal error in MPI_Sendrecv: Invalid argument, error stack:
MPI_Sendrecv(259): MPI_Sendrecv(sbuf=0x7ffcc92e8824, scount=1, MPI_INTEGER, dest=4, stag=10, rbuf=0x6be240, rcount=1, MPI_INTEGER, src=MPI_ANY_SOURCE, rtag=10, comm=0x84000007, status=(nil)) failed
MPI_Sendrecv(129): Null pointer in parameter status

This can be reproduced with the following code:


program mpicart_sndrcv
  use mpi
  implicit none

  integer :: nproc, rank, nproc_cart, rank_cart
  integer :: nx, ny
  integer :: comm, comm_cart
  integer, dimension(4) :: srcdst
  integer, dimension(2) :: rcvd=0
  integer :: ierr

  integer :: i
  integer :: averbose=1
  integer :: lverbose

  call MPI_Init(ierr)
  call MPI_Comm_dup(MPI_COMM_WORLD, comm, ierr)
  call MPI_Comm_size(comm, nproc, ierr)
  call MPI_Comm_rank(comm, rank, ierr)

  lverbose = averbose
  if (rank /= 0) averbose = 0

  call decomp_2d(nproc, nx, ny)
  if (averbose >= 1) print '("Domain decomposition:")'
  if (averbose >= 1) print '("nproc: ", I2, ", nx: ", I2, ", ny: ", I2)', nproc, nx, ny

  call MPI_Cart_create(comm, 2, (/ nx, ny /), (/ .false., .false. /), .true., comm_cart, ierr)
  call MPI_Comm_size(comm_cart, nproc_cart, ierr)
  call MPI_Comm_rank(comm_cart, rank_cart, ierr)
  call MPI_Cart_shift(comm_cart, 0, 1, srcdst(1), srcdst(2), ierr)
  call MPI_Cart_shift(comm_cart, 1, 1, srcdst(3), srcdst(4), ierr)

  if (lverbose >= 1) then
     do i = 0, nproc_cart-1
    call MPI_Barrier(comm_cart, ierr)
    if (i == rank_cart) then
           print '("rank_cart ", I2, " communications:")', rank_cart
           print '(" - x direction, source: ", I2, ", destination: ", I2)', srcdst(1), srcdst(2)
           print '(" - y direction, source: ", I2, ", destination: ", I2)', srcdst(3), srcdst(4)
    end if
     end do
  end if

  call MPI_Sendrecv(rank_cart, 1, MPI_INTEGER, srcdst(2), 10, &
                    rcvd(1), 1, MPI_INTEGER, srcdst(1), 10, &
                    comm_cart, MPI_STATUS_IGNORE, ierr)
  call MPI_Sendrecv(rank_cart+1, 1, MPI_INTEGER, srcdst(4), 11, &
                    rcvd(2), 1, MPI_INTEGER, srcdst(3), 11, &
                    comm_cart, MPI_STATUS_IGNORE, ierr)

  call MPI_Finalize(ierr)

end program mpicart_sndrcv

subroutine decomp_2d(n, nx, ny)
  implicit none

  integer, intent(in) :: n
  integer, intent(out) :: nx, ny
  integer :: sq, i

  nx = int(sqrt(n*1.d0))
  ny = 1
  i = nx
  do while (i > 0)
     if (mod(n, i) == 0) then
        ny = n/i
        exit
     end if
  end do

end subroutine decomp_2d
marcjoos-cea commented 1 year ago

It seems that the issue is not fixed in the current (3.6.4) version

kevin-juilly commented 11 months ago

The test case doesn't work because of MPI_STATUS_IGNORE uses. I opened #61 and my PR for it fixes the problem.