eschnett / MPItrampoline

A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI
MIT License
44 stars 4 forks source link

Error building on macOS with OpenMPI 5 #46

Open PhilipVinc opened 2 weeks ago

PhilipVinc commented 2 weeks ago

As title, using homebred-installed openmpi

python-3.11.2 ❯ brew link openmpi
Linking /opt/homebrew/Cellar/open-mpi/5.0.3_1... 561 symlinks created.

compilation fails with the following errors

git clone https://github.com/eschnett/MPIwrapper.git
mkdir MPIwrapper/build
cd MPIwrapper/build

cmake ..
make VERBOSE=1

/opt/homebrew/Cellar/cmake/3.30.2/bin/cmake -S/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper -B/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build --check-build-system CMakeFiles/Makefile.cmake 0
/opt/homebrew/Cellar/cmake/3.30.2/bin/cmake -E cmake_progress_start /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build/CMakeFiles /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build//CMakeFiles/progress.marks
/Library/Developer/CommandLineTools/usr/bin/make  -f CMakeFiles/Makefile2 all
/Library/Developer/CommandLineTools/usr/bin/make  -f CMakeFiles/mpiwrapper.dir/build.make CMakeFiles/mpiwrapper.dir/depend
[ 20%] Generating src/mpiabi_defn_constants_c.h, src/mpiabi_defn_functions_c.h, src/mpiabi_defn_constants_fortran.h, src/mpiabi_defn_functions_fortran.h
/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/gen/gen_defn.py
[ 40%] Generating src/mpiabi_decl_constants_c.h, src/mpiabi_decl_functions_c.h, src/mpiabi_decl_constants_fortran.h, src/mpiabi_decl_functions_fortran.h
/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/gen/gen_decl.py
cd /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build && /opt/homebrew/Cellar/cmake/3.30.2/bin/cmake -E cmake_depends "Unix Makefiles" /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build/CMakeFiles/mpiwrapper.dir/DependInfo.cmake "--color="
Dependee "/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build/CMakeFiles/mpiwrapper.dir/DependInfo.cmake" is newer than depender "/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build/CMakeFiles/mpiwrapper.dir/depend.internal".
Dependee "/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build/CMakeFiles/mpiwrapper.dir/depend.internal".
Scanning dependencies of target mpiwrapper
/Library/Developer/CommandLineTools/usr/bin/make  -f CMakeFiles/mpiwrapper.dir/build.make CMakeFiles/mpiwrapper.dir/build
[ 60%] Building CXX object CMakeFiles/mpiwrapper.dir/src/mpiwrapper.cxx.o
/Library/Developer/CommandLineTools/usr/bin/c++ -Dmpiwrapper_EXPORTS -I/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/mpiabi -I/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build/mpiabi -I/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build/src -isystem /opt/homebrew/Cellar/open-mpi/5.0.3_1/include -isystem /opt/homebrew/Cellar/open-mpi/5.0.3_1/lib -std=gnu++11 -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.4.sdk -fPIC -MD -MT CMakeFiles/mpiwrapper.dir/src/mpiwrapper.cxx.o -MF CMakeFiles/mpiwrapper.dir/src/mpiwrapper.cxx.o.d -o CMakeFiles/mpiwrapper.dir/src/mpiwrapper.cxx.o -c /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/src/mpiwrapper.cxx
In file included from /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/src/mpiwrapper.cxx:121:
/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build/src/mpiabi_defn_constants_c.h:156:70: warning: 'OMPI_C_MPI_DUP_FN' is deprecated: MPI_DUP_FN was deprecated in MPI-2.0; use MPI_COMM_DUP_FN instead. [-Wdeprecated-declarations]
MPIABI_Copy_function * const MPIABI_DUP_FN = (MPIABI_Copy_function *)MPI_DUP_FN;
                                                                     ^
/opt/homebrew/Cellar/open-mpi/5.0.3_1/include/mpi.h:3072:20: note: expanded from macro 'MPI_DUP_FN'
#define MPI_DUP_FN OMPI_C_MPI_DUP_FN
                   ^
/opt/homebrew/Cellar/open-mpi/5.0.3_1/include/mpi.h:3079:13: note: 'OMPI_C_MPI_DUP_FN' has been explicitly marked deprecated here
            __mpi_interface_deprecated__("MPI_DUP_FN was deprecated in MPI-2.0; use MPI_COMM_DUP_FN instead.");
            ^
/opt/homebrew/Cellar/open-mpi/5.0.3_1/include/mpi.h:302:78: note: expanded from macro '__mpi_interface_deprecated__'
#                    define __mpi_interface_deprecated__(msg) __attribute__((__deprecated__(msg)))
                                                                             ^
In file included from /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/src/mpiwrapper.cxx:121:
/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build/src/mpiabi_defn_constants_c.h:157:76: warning: 'OMPI_C_MPI_NULL_COPY_FN' is deprecated: MPI_NULL_COPY_FN was deprecated in MPI-2.0; use MPI_COMM_NULL_COPY_FN instead. [-Wdeprecated-declarations]
MPIABI_Copy_function * const MPIABI_NULL_COPY_FN = (MPIABI_Copy_function *)MPI_NULL_COPY_FN;
                                                                           ^
/opt/homebrew/Cellar/open-mpi/5.0.3_1/include/mpi.h:3082:26: note: expanded from macro 'MPI_NULL_COPY_FN'
#define MPI_NULL_COPY_FN OMPI_C_MPI_NULL_COPY_FN
                         ^
/opt/homebrew/Cellar/open-mpi/5.0.3_1/include/mpi.h:3089:13: note: 'OMPI_C_MPI_NULL_COPY_FN' has been explicitly marked deprecated here
            __mpi_interface_deprecated__("MPI_NULL_COPY_FN was deprecated in MPI-2.0; use MPI_COMM_NULL_COPY_FN instead.");
            ^
/opt/homebrew/Cellar/open-mpi/5.0.3_1/include/mpi.h:302:78: note: expanded from macro '__mpi_interface_deprecated__'
#                    define __mpi_interface_deprecated__(msg) __attribute__((__deprecated__(msg)))
                                                                             ^
In file included from /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/src/mpiwrapper.cxx:121:
/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build/src/mpiabi_defn_constants_c.h:169:59: error: use of undeclared identifier 'MPI_COMPLEX32'
MPIABI_Datatype const MPIABI_COMPLEX32 = (MPIABI_Datatype)MPI_COMPLEX32;
                                                          ^
/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build/src/mpiabi_defn_constants_c.h:208:56: error: use of undeclared identifier 'MPI_REAL16'
MPIABI_Datatype const MPIABI_REAL16 = (MPIABI_Datatype)MPI_REAL16;
                                                       ^
/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/MPI/MPIwrapper/build/src/mpiabi_defn_constants_c.h:224:82: warning: 'OMPI_C_MPI_NULL_DELETE_FN' is deprecated: MPI_NULL_DELETE_FN was deprecated in MPI-2.0; use MPI_COMM_NULL_DELETE_FN instead. [-Wdeprecated-declarations]
MPIABI_Delete_function * const MPIABI_NULL_DELETE_FN = (MPIABI_Delete_function *)MPI_NULL_DELETE_FN;
                                                                                 ^
/opt/homebrew/Cellar/open-mpi/5.0.3_1/include/mpi.h:3092:28: note: expanded from macro 'MPI_NULL_DELETE_FN'
#define MPI_NULL_DELETE_FN OMPI_C_MPI_NULL_DELETE_FN
                           ^
/opt/homebrew/Cellar/open-mpi/5.0.3_1/include/mpi.h:3097:13: note: 'OMPI_C_MPI_NULL_DELETE_FN' has been explicitly marked deprecated here
            __mpi_interface_deprecated__("MPI_NULL_DELETE_FN was deprecated in MPI-2.0; use MPI_COMM_NULL_DELETE_FN instead.");
            ^
/opt/homebrew/Cellar/open-mpi/5.0.3_1/include/mpi.h:302:78: note: expanded from macro '__mpi_interface_deprecated__'
#                    define __mpi_interface_deprecated__(msg) __attribute__((__deprecated__(msg)))
                                                                             ^
3 warnings and 2 errors generated.
make[2]: *** [CMakeFiles/mpiwrapper.dir/src/mpiwrapper.cxx.o] Error 1
make[1]: *** [CMakeFiles/mpiwrapper.dir/all] Error 2
make: *** [all] Error 2
PhilipVinc commented 2 weeks ago

It was working on OpenMPI 4.X, so may it be that MPITrampoline is not compiatlbe with OMPI 5?

The problem seems to stem from MPI_REAL16and MPI_COMPLEX32

eschnett commented 2 weeks ago

I have tested MPItrampoline with OpenMPI 5, so this should be working. It seems that the autodetection whether the Fortran type complex*32 is supported is failing.

Which C and Fortran compilers are you using?

PhilipVinc commented 2 weeks ago

C compiler is apple's default (clang)

python-3.11.2 ❯ cc --version
Apple clang version 15.0.0 (clang-1500.3.9.4)
Target: arm64-apple-darwin23.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Fortran is installed from home-brew, and is gfortran

-- Check for working Fortran compiler: /opt/homebrew/bin/gfortran - skipped
python-3.11.2 ❯ gfortran --version
GNU Fortran (Homebrew GCC 14.1.0_2) 14.1.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
eschnett commented 2 weeks ago

I cannot reproduce this; for me, complex*32 exists.

I notice that you are using the arm64 architecture while I am using x86_64. I will need to test this architecture as well, and then likely add code to disable complex*32 if it isn't supported.

PhilipVinc commented 2 weeks ago

Yes, sorry. Indeed I'm on an Arm M1 Mac... If you need me to run anything let me know.

eschnett commented 2 weeks ago

Can you try the following work-around?

First, you delete the directory MPIwrapper/build to start the build from scratch. Then, instead of calling cmake .., you use this command:

env CFLAGS='-DMPI_REAL16=MPI_TYPE_NULL -DMPI_COMPLEX32=MPI_TYPE_NULL' cmake ..

This defines the two missing MPI types, and the build should succeed.

If this works then I can look at automating this.

eschnett commented 2 weeks ago

As a side note: I did test on arm64 (more than a year ago), and the test did succeed. This was on Debian. I assume that macOS (or Homebrew) are making different decisions as to whether real*16 and complex*32 are supported.

PhilipVinc commented 2 weeks ago

This is a recent thing. I had no issues before openMPI 5 was released, I think (though I usually use MPIch so I'm not sure...)

EDIT: or maybe it was a macOS update. I'm not sure... I don't manage to install older openmpi versions from home-brew.

Anyhow, your proposed fix does not work . see https://gist.github.com/PhilipVinc/f6ec19a28e5255d9c93a9446eb8ed8e6

PhilipVinc commented 2 weeks ago

I tried with

env CXXFLAGS='-DMPI_REAL16=MPI_TYPE_NULL -DMPI_COMPLEX32=MPI_TYPE_NULL' cmake .. && make

and I believe it achieved what you might have wanted, but now compilation fails with

In file included from /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/mm/MPIwrapper/src/mpiwrapper.cxx:121:
/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/mm/MPIwrapper/build/src/mpiabi_defn_constants_c.h:169:59: error: use of undeclared identifier 'MPI_TYPE_NULL'
MPIABI_Datatype const MPIABI_COMPLEX32 = (MPIABI_Datatype)MPI_COMPLEX32;
                                                          ^
<command line>:3:23: note: expanded from macro 'MPI_COMPLEX32'
#define MPI_COMPLEX32 MPI_TYPE_NULL
                      ^
In file included from /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/mm/MPIwrapper/src/mpiwrapper.cxx:121:
/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/mm/MPIwrapper/build/src/mpiabi_defn_constants_c.h:208:56: error: use of undeclared identifier 'MPI_TYPE_NULL'
MPIABI_Datatype const MPIABI_REAL16 = (MPIABI_Datatype)MPI_REAL16;
                                                       ^
<command line>:2:20: note: expanded from macro 'MPI_REAL16'
#define MPI_REAL16 MPI_TYPE_NULL
                   ^
In file included from /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/mm/MPIwrapper/src/mpiwrapper.cxx:121:
eschnett commented 2 weeks ago

Thanks! Yes, the C++ flags should be set. And I also misremembered the name of the constant; it should be MPI_DATATYPE_NULL instead. Can you try this?

PhilipVinc commented 2 weeks ago

Yup, this works and fixes the compilation problem!

env CXXFLAGS='-DMPI_REAL16=MPI_DATATYPE_NULL -DMPI_COMPLEX32=MPI_DATATYPE_NULL' cmake .. && make

However... there's now a link-time problem that I do not understand... See here for full log https://gist.github.com/PhilipVinc/9a990dcdcb4659bcb7b06cdeeff36087

/Library/Developer/CommandLineTools/usr/bin/c++ -DMPI_REAL16=MPI_DATATYPE_NULL -DMPI_COMPLEX32=MPI_DATATYPE_NULL -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.4.sdk -bundle -Wl,-headerpad_max_install_names -Wl,-flat_namespace -Wl,-commons,use_dylibs -Wl,-ld_classic -o libmpiwrapper.so CMakeFiles/mpiwrapper.dir/src/mpiwrapper.cxx.o CMakeFiles/mpiwrapper.dir/src/mpiwrapper.f.o   -L/opt/homebrew/Cellar/gcc/14.1.0_2/lib/gcc/current/gcc/aarch64-apple-darwin23/14  -L/opt/homebrew/Cellar/gcc/14.1.0_2/lib/gcc/current/gcc  -L/opt/homebrew/Cellar/gcc/14.1.0_2/lib/gcc/current  /opt/homebrew/Cellar/open-mpi/5.0.3_1/lib/libmpi_usempif08.dylib /opt/homebrew/Cellar/open-mpi/5.0.3_1/lib/libmpi_usempi_ignore_tkr.dylib /opt/homebrew/Cellar/open-mpi/5.0.3_1/lib/libmpi_mpifh.dylib /opt/homebrew/Cellar/open-mpi/5.0.3_1/lib/libmpi.dylib -lemutls_w -lheapt_w -lgfortran -lgcc -lquadmath
ld: warning: -commons use_dylibs is no longer supported, using error treatment instead
ld: warning: could not create compact unwind for _allocate_tramp_ctrl: does not use standard frame
ld: warning: could not create compact unwind for ___gcc_nested_func_ptr_created: does not use standard frame
ld: warning: could not create compact unwind for ___gcc_nested_func_ptr_deleted: does not use standard frame
ld: warning: could not create compact unwind for ___emutls_get_address: does not use standard frame
ld: warning: could not create compact unwind for ___powitf2: does not use standard frame
ld: warning: could not create compact unwind for ___multc3: does not use standard frame
ld: warning: could not create compact unwind for ___divtc3: does not use standard frame
ld: warning: could not create compact unwind for ___mulbitint3: registers 27 and 28 not saved contiguously in frame
...
...
ld: warning: could not create compact unwind for ___bid_floatbitinttd: does not use standard frame
Checking whether libmpiwrapper.so plugin uses a two-level namespace...
/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/mm/MPIwrapper/check_twolevel.sh /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/mm/MPIwrapper/build/libmpiwrapper.so
*** ERROR: plugin /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/mm/MPIwrapper/build/libmpiwrapper.so does not use a two-level namespace
otool -hV /Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/mm/MPIwrapper/build/libmpiwrapper.so
/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/mm/MPIwrapper/build/libmpiwrapper.so:
Mach header
      magic  cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
MH_MAGIC_64    ARM64        ALL  0x00      BUNDLE    22       2040 DYLDLINK WEAK_DEFINES BINDS_TO_WEAK
make[2]: *** [libmpiwrapper.so] Error 1
make[2]: *** Deleting file `libmpiwrapper.so'
make[1]: *** [CMakeFiles/mpiwrapper.dir/all] Error 2
make: *** [all] Error 2
eschnett commented 2 weeks ago

That error is caused by a failing consistency check in MPIwrapper. It means that this OpenMPI library cannot be used.

When you use MPItrampoline then there will be two different functions called e.g. MPI_Send in your executable: The one provided by OpenMPI, and the one provided by MPItrampoline. It is important that they are kept separate because they are not compatible.

On Linux, where the ELF format is usually used for binaries, this is usually not a problem. macOS uses a different format, and shared libraries there can be created in two different variants: "flat namespace" and "two-level namespace". In a flat namespace there can be only one function with a given name, and this won't work with MPIwrapper. A two-level namespace is the default setting, but unfortunately OpenMPI explicitly requests a flat namespace when building on macOS.

The macOS MacPorts developers switched back to a two-level namespace after my request. Maybe Homebrew would be willing to do the same? (I'm not using Homebrew myself.) The only other way out is to build OpenMPI from source yourself after patching it to remove the build option that requests a flat namespace. (This may or may not also fix the MPI_REAL16 problem for you.)

Some time ago I used these commands for building OpenMPI with a two-level namespace:

cd $HOME/src/openmpi-4.1.4
find . -type f -print0 | xargs -0 perl -pi -e 's/-Wl,-flat_namespace//g;s/\$\{wl\}-flat_namespace//g'
./autogen.pl --force
./configure --prefix=$HOME/openmpi-4.1.4 --disable-shared --enable-static --enable-mpi-fortran=usempif08 CC=gcc CXX=g++ FC=gfortran
make -j$(nproc) && make -j$(nproc) install
eschnett commented 2 weeks ago

I just see that the options above build a static library and no shared library. That's fine, but in this case there would be no distinction between flat and two-level namespaces because these apply only to shared libraries. The patch applied in the line starting with find and the subsequent autogen.pl should then not be necessary.

eschnett commented 2 weeks ago

I created a branch https://github.com/eschnett/MPIwrapper/tree/eschnett/real16-apple-arm64. Can you try this branch? This should correct the compile error without requiring special compiler options.

It will not correct the linker error, this can only be corrected by re-building OpenMPI with different linker options.

PhilipVinc commented 2 weeks ago

The branch compiles fine. I'll try to get in touch with homebred maintainers to change the formula..

PhilipVinc commented 2 weeks ago

thanks!

PhilipVinc commented 2 weeks ago

Do you have a link to an issue on GitHub's Macports repo I could link to?