cea-hpc / wi4mpi

Wrapper interface for MPI
BSD 3-Clause "New" or "Revised" License
80 stars 15 forks source link

MPI_MAX_LIBRARY_VERSION_STRING can be different between two MPI libraries #50

Closed laurent-nguyen closed 7 months ago

laurent-nguyen commented 1 year ago

Hello,

To reproduce: OS = Rhel8 OpenMPI 4.1.5 mpich 3.4.2

Here the code to reproduce:

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

int main(int argc, char *argv[])
{
  MPI_Init(&argc, &argv);

  int size = 1, rank = 0;
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  char processor_name[MPI_MAX_PROCESSOR_NAME] = "";
  int name_len = 0;
  MPI_Get_processor_name(processor_name, &name_len);

  char library_version[MPI_MAX_LIBRARY_VERSION_STRING] = "";
  MPI_Get_library_version(library_version, &name_len);

  printf("size of MPI_MAX_LIBRARY_VERSION_STRING = %d\n", MPI_MAX_LIBRARY_VERSION_STRING);
  printf("Hello world from processor %s, rank %d out of %d processors with MPI library %s\n",
     processor_name, rank, size, library_version);

  MPI_Finalize();
  return EXIT_SUCCESS;

When compiling with OpenMPI:

$ ./hello
size of MPI_MAX_LIBRARY_VERSION_STRING = 256
Hello world from processor node1, rank 0 out of 1 processors with MPI library Open MPI v4.1.5, package: Open MPI cloud-user@node1 Distribution, ident: 4.1.5, repo rev: v4.1.5, Feb 23, 2023

When compiling with MPICH:

size of MPI_MAX_LIBRARY_VERSION_STRING = 8192
Hello world from processor node1, rank 0 out of 1 processors with MPI library MPICH Version:    3.4.2
MPICH Release date: Wed May 26 15:51:40 CDT 2021
MPICH ABI:  13:11:1
MPICH Device:   ch3:nemesis
MPICH configure:    --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-sharedlibs=gcc --enable-shared --enable-static=no --enable-lib-depend --disable-rpath --disable-silent-rules --enable-fortran --with-gnu-ld --with-device=ch3:nemesis --with-pm=hydra:gforker --includedir=/usr/include/mpich-x86_64 --bindir=/usr/lib64/mpich/bin --libdir=/usr/lib64/mpich/lib --datadir=/usr/share/mpich --mandir=/usr/share/man/mpich-x86_64 --docdir=/usr/share/mpich/doc --htmldir=/usr/share/mpich/doc --with-hwloc-prefix=system
MPICH CC:   gcc -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection   -O2
MPICH CXX:  g++ -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection  -O2
MPICH F77:  gfortran -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules  -O2
MPICH FC:   gfortran -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules  -O2

We can see that the MPI_MAX_LIBRARY_VERSION_STRING constant is greater in MPICH than OpenMPI. So, it implies there is a memory overflow when translating from OpenMPI to MPICH:

$ wi4mpi -f openmpi -t mpich ./hello.exe 
./hello.exe: Symbol `ompi_mpi_comm_world' has different size in shared object, consider re-linking
You are using Wi4MPI-3.6.4 with the mode preload From OMPI To MPICH
size of MPI_MAX_LIBRARY_VERSION_STRING = 256
Hello world from processor usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-sharedlibs=gcc --enable-shared --enable-static=no --enable-lib-depend --disable-rpath --disable-silent-rules --enable-fortran --with-gnu-ld --with-device=ch3:nemesis --with-pm=hydra:gforker --includedir=/usr/include/mpich-x86_64 --bindir=/usr/lib64/mpich/bin --libdir=/usr/lib64/mpich/lib --datadir=/usr/share/mpich --mandir=/usr/share/man/mpich-x86_64 --docdir=/usr/share/mpich/doc --htmldir=/usr/share/mpich/doc --with-hwloc-prefix=system
MPICH CC:   gcc -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection   -O2
MPICH CXX:  g++ -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection  -O2
MPICH F77:  gfortran -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules  -O2
MPICH FC:   gfortran -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules  -O2
, rank 1651076196 out of 1667710323 processors with MPI library MPICH Version:  3.4.2
MPICH Release date: Wed May 26 15:51:40 CDT 2021
MPICH ABI:  13:11:1
MPICH Device:   ch3:nemesis
MPICH configure:    --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-pref�   
Segmentation fault (core dumped)

MPICH returns a larger buffer than OpenMPI.

Thanks,

kevin-juilly commented 9 months ago

Here is the list I made of constants that are similar in terms of incompatibility :

MPI_MAX_PROCESSOR_NAME
MPI_MAX_LIBRARY_VERSION_STRING
MPI_MAX_ERROR_STRING
MPI_MAX_PORT_NAME
MPI_MAX_OBJECT_NAME
MPI_MAX_INFO_KEY
MPI_MAX_INFO_VAL