SWIFTSIM / SWIFT

Modern astrophysics and cosmology particle-based code. Mirror of gitlab developments at https://gitlab.cosma.dur.ac.uk/swift/swiftsim
http://www.swiftsim.com
GNU Lesser General Public License v3.0
88 stars 58 forks source link

Can't compile swift with MPI VELOCIraptor #29

Closed Findus23 closed 2 years ago

Findus23 commented 2 years ago

Hello,

I am trying to compile SWIFT with VELOCIraptor and following https://swift.dur.ac.uk/docs/VELOCIraptorInterface/stfwithswift.html it seems rather straightforward. Nevertheless I fail, no matter which combination of options I try. I am using a regular desktop PC using Debian Testing.

The steps to reproduce from fresh clones are:

git clone https://github.com/ICRAR/VELOCIraptor-STF.git
cd VELOCIraptor-STF
git rev-parse HEAD # returns dc6d330eef60b7ca10e029d9a9af434454575daa
mkdir build-sp
cd build-sp
cmake ../ -DVR_USE_HYDRO=ON -DVR_USE_SWIFT_INTERFACE=ON -DCMAKE_CXX_FLAGS="-fPIC" -DCMAKE_BUILD_TYPE=Release -DVR_MPI=OFF
make
cd ..
mkdir build-mp
cd build-mp
cmake ../ -DVR_USE_HYDRO=ON -DVR_USE_SWIFT_INTERFACE=ON -DCMAKE_CXX_FLAGS="-fPIC" -DCMAKE_BUILD_TYPE=Release -DVR_MPI=ON
make
cd ../../
git clone https://gitlab.cosma.dur.ac.uk/swift/swiftsim.git
cd swiftsim
git rev-parse HEAD # returns 25a7aaa4cb35c42cbee9e7ae78c48eb10a7844c5
./autogen.sh
autoreconf --version # returns autoreconf (GNU Autoconf) 2.71
./configure --enable-fof --with-velociraptor=/home/lukas/git/VELOCIraptor-STF/build-sp/src --with-velociraptor-mpi=/home/lukas/git/VELOCIraptor-STF/build-mp/src
make

The compilation then halts with this MPI error in libvelociraptor.a:

libtool: link: mpicc -I../src -I../argparse -I/usr/include -I/usr/include/hdf5/serial -fopenmp -DWITH_MPI "-DENGINE_POLICY=engine_policy_keep | engine_policy_setaffinity" -O3 -fomit-frame-pointer -malign-double -fstrict-aliasing -ffast-math -funroll-loops -march=amdfam10 -mavx2 -pthread -fopenmp -fopenmp -Wall -Wextra -Wno-unused-parameter -Wshadow -Werror -Wstrict-prototypes -o swift_mpi swift_mpi-main.o  -L/usr/lib/x86_64-linux-gnu/hdf5/serial ../src/.libs/libswiftsim_mpi.a ../argparse/.libs/libargparse.a -L/home/lukas/git/VELOCIraptor-STF/build-mp/src -lvelociraptor -lmpi -lstdc++ -lgsl -lgslcblas -lhdf5_hl -lhdf5 -lcrypto -lcurl -lsz -lz -ldl -lfftw3_threads -lfftw3 -lnuma -lpthread -lm -pthread -fopenmp
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o): in function `MPI::Op::Init(void (*)(void const*, void*, int, MPI::Datatype const&), bool)':
swiftinterface.cxx:(.text._ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb[_ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb]+0x19): undefined reference to `ompi_mpi_cxx_op_intercept'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o): in function `MPI::Intracomm::Clone() const':
swiftinterface.cxx:(.text._ZNK3MPI9Intracomm5CloneEv[_ZNK3MPI9Intracomm5CloneEv]+0x2c): undefined reference to `MPI::Comm::Comm()'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o): in function `MPI::Graphcomm::Clone() const':
swiftinterface.cxx:(.text._ZNK3MPI9Graphcomm5CloneEv[_ZNK3MPI9Graphcomm5CloneEv]+0x27): undefined reference to `MPI::Comm::Comm()'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o): in function `MPI::Cartcomm::Sub(bool const*) const':
swiftinterface.cxx:(.text._ZNK3MPI8Cartcomm3SubEPKb[_ZNK3MPI8Cartcomm3SubEPKb]+0x7e): undefined reference to `MPI::Comm::Comm()'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o): in function `MPI::Intracomm::Create_graph(int, int const*, int const*, bool) const':
swiftinterface.cxx:(.text._ZNK3MPI9Intracomm12Create_graphEiPKiS2_b[_ZNK3MPI9Intracomm12Create_graphEiPKiS2_b]+0x2e): undefined reference to `MPI::Comm::Comm()'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o): in function `MPI::Cartcomm::Clone() const':
swiftinterface.cxx:(.text._ZNK3MPI8Cartcomm5CloneEv[_ZNK3MPI8Cartcomm5CloneEv]+0x27): undefined reference to `MPI::Comm::Comm()'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o):swiftinterface.cxx:(.text._ZNK3MPI9Intracomm11Create_cartEiPKiPKbb[_ZNK3MPI9Intracomm11Create_cartEiPKiPKbb]+0x93): more undefined references to `MPI::Comm::Comm()' follow
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o):(.data.rel.ro._ZTVN3MPI8DatatypeE[_ZTVN3MPI8DatatypeE]+0x78): undefined reference to `MPI::Datatype::Free()'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o):(.data.rel.ro._ZTVN3MPI3WinE[_ZTVN3MPI3WinE]+0x48): undefined reference to `MPI::Win::Free()'
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:800: swift_mpi] Fehler 1
make[2]: Verzeichnis „/home/lukas/tmp/swiftsim/examples“ wird verlassen
make[1]: *** [Makefile:525: all-recursive] Fehler 1
make[1]: Verzeichnis „/home/lukas/tmp/swiftsim“ wird verlassen
make: *** [Makefile:457: all] Fehler 2

(in case you need the full log or any other additional information, I can share it too)

I'm not the biggest expert on openMPI, but everything obvious seems correct to me:

Nevertheless I don't doubt that I could be missing something obvious (that could then maybe also be added to the docs)

I also found https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/780, so I am wondering if maybe something broke with that PR in the setup explained in the docs.

pwdraper commented 2 years ago

Hi, that all looks like issues with linking against C++ using a C compiler. Not sure why that part has changed (nothing to do with #780). Try the following:

cd examples
make MPICC=mpicxx CC=mpicxx

I expect that will work. Not sure how we can fix this permanently.

Findus23 commented 2 years ago

Many thanks for the response. I assume you mean setting MPICC=mpicxx CC=mpicxx for the swift build (not VELOCIraptor).

In that case (no matter if inside of examples or not) I get a lot of errors that look like even more C++/C mixup:

➜  ~/swiftsim/examples LANG=C make -j MPICC=mpicxx CC=mpicxx
mpicxx -DHAVE_CONFIG_H -I. -I..     -I../src -I../argparse -I/usr/include -I/usr/include/hdf5/serial     -fopenmp  -DENGINE_POLICY="engine_policy_keep | engine_policy_setaffinity" -O3 -fomit-frame-pointer -malign-double -fstrict-aliasing -ffast-math -funroll-loops -march=amdfam10 -mavx2 -pthread -fopenmp -fopenmp -Wall -Wextra -Wno-unused-parameter -Wshadow -Werror -Wstrict-prototypes -MT swift-main.o -MD -MP -MF .deps/swift-main.Tpo -c -o swift-main.o `test -f 'main.c' || echo './'`main.c
make: *** No rule to make target '../src/.libs/libswiftsim.a', needed by 'swift'.  Stop.
make: *** Waiting for unfinished jobs....
cc1plus: error: command-line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++ [-Werror]
In file included from ../src/kernel_hydro.h:38,
                 from ../src/cell.h:41,
                 from ../src/active.h:26,
                 from ../src/swift.h:26,
                 from main.c:45:
../src/dimension.h: In function 'vector pow_dimension_vec(vector)':
../src/dimension.h:355:50: error: no matching function for call to 'vector::vector(__m256)'
  355 |   return (vector)(vec_mul(vec_mul(x.v, x.v), x.v));
      |                                                  ^
In file included from ../src/dimension.h:33,
                 from ../src/kernel_hydro.h:38,
                 from ../src/cell.h:41,
                 from ../src/active.h:26,
                 from ../src/swift.h:26,
                 from main.c:45:
[...]
../src/cache.h:1010:34: error: narrowing conversion of '-((2.0e+0 * ((double)((const cell*)cj)->cell::width[2])) + ((double)max_dx))' from 'double' to 'float' [-Werror=narrowing]
 1010 |                                  -(2. * cj->width[2] + max_dx)};
      |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../src/cell.h:41,
                 from ../src/active.h:26,
                 from ../src/swift.h:26,
                 from main.c:45:
../src/kernel_hydro.h: At global scope:
../src/kernel_hydro.h:479:21: error: 'cubic_1_dwdx_const_c2' defined but not used [-Werror=unused-variable]
  479 | static const vector cubic_1_dwdx_const_c2 = FILL_VEC(0.f);
      |                     ^~~~~~~~~~~~~~~~~~~~~
../src/kernel_hydro.h:475:21: error: 'cubic_1_const_c2' defined but not used [-Werror=unused-variable]
  475 | static const vector cubic_1_const_c2 = FILL_VEC(0.f);
      |                     ^~~~~~~~~~~~~~~~
../src/kernel_hydro.h:447:21: error: 'kernel_ivals_vec' defined but not used [-Werror=unused-variable]
  447 | static const vector kernel_ivals_vec = FILL_VEC((float)kernel_ivals);
      |                     ^~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make: *** [Makefile:872: swift-main.o] Error 1
pwdraper commented 2 years ago

Yes, just in the examples directory of SWIFT. So build normally until you get to the error you report above, then cd into examples and run make with CC and MPICC re-defined.

Looks like you haven't built the rest of SWIFT first, that doesn't work with C++ as the compiler, unless you disable the hand vectorization and stop compiler warnings being errors, and may not be 100% happy then.

pwdraper commented 2 years ago

BTW, this is all I see when I do this:

> make MPICC=mpicxx CC=mpicxx
/bin/bash ../libtool  --tag=CC   --mode=link mpicxx  -I../src -I../argparse -I/usr/include -I/usr/include/hdf5/serial   E_POLICY="engine_policy_keep | engine_policy_setaffinity" -O3 -fomit-frame-pointer -malign-double -fstrict-aliasing -ffalake -mavx2 -pthread -fopenmp -fopenmp -Wall -Wextra -Wno-unused-parameter -Wshadow -Werror -Wstrict-prototypes -L/usr/llpthread -lpthread -lm  -o swift_mpi swift_mpi-main.o ../src/.libs/libswiftsim_mpi.a ../argparse/.libs/libargparse.a    t-tests/temp/VELOCIraptor-STF/build-mp/src -lvelociraptor -lmpi -lstdc++ -lhdf5 -lgsl -lgslcblas -lhdf5_hl -lhdf5  -lpthreads -lfftw3 -lnuma        -lpthread -lm 
libtool: link: mpicxx -I../src -I../argparse -I/usr/include -I/usr/include/hdf5/serial -fopenmp -DWITH_MPI "-DENGINE_POLolicy_setaffinity" -O3 -fomit-frame-pointer -malign-double -fstrict-aliasing -ffast-math -funroll-loops -march=skylake --Wall -Wextra -Wno-unused-parameter -Wshadow -Werror -Wstrict-prototypes -o swift_mpi swift_mpi-main.o  -L/usr/lib/x86_6ibs/libswiftsim_mpi.a ../argparse/.libs/libargparse.a -L../..//swift-tests/temp/VELOCIraptor-STF/builtdc++ -lgsl -lgslcblas -lhdf5_hl -lhdf5 -lsz -lz -ldl -lfftw3_threads -lfftw3 -lnuma -lpthread -lm -pthread -fopenmp

so only the linking part is done using C++.

Findus23 commented 2 years ago

Ah, I misunderstood you comment. Indeed if I run

cd examples
make MPICC=mpicxx CC=mpicxx

after the aborted main make run (so everything from the first post), it works and looks as it does for you.

So this workaround works fine for me, thanks.

pwdraper commented 2 years ago

Good, seems this is all caused by pulling in the C++ interface of OpenMPI, which doesn't have C linkage so requires that the C++ compiler does the linking.

@jchelly you are the most likely to have tried this before. Is this new or have we never built against the MPI version of VR before?

jchelly commented 2 years ago

This has definitely worked in the past on Cosma with the Intel 2018 compiler because I have an EAGLE-XL L0075N1128 run with velociraptor which completed. I haven't tried it since the separate MPI/no MPI configure options were added.

When we were trying to run EAGLE-XL on Irene I had to use a few extra flags. From the notes I put on the gitlab wiki:

# Ensure we get the right compiler run time libraries
export LDFLAGS="-L${MPI_ROOT}/lib/ -L${C_INTEL_ROOT}/lib/intel64/ -cxxlib"

# If IPO is enabled we need to link all dependencies explicitly
export LIBS="-lopen-rte -lopen-pal -lmpi_cxx"

So it did need a bit of help finding the C++ MPI library.

pwdraper commented 2 years ago

Thanks. I expect the most important part is the -lmpi_cxx part. So this also works:

export LIBS="-lmpi++"
./configure ...
pwdraper commented 2 years ago

I've updated the documentation to include advice to also include the C++ MPI library if symbols like these are reported as missing. Please close this issue if you are happy now. Thanks for the report.

Findus23 commented 2 years ago

Indeed, with LIBS=-lmpi++ it's working exactly as expected. Thanks for the help!