geoschem / gchp_legacy

Repository for GEOS-Chem High Performance: software that enables running GEOS-Chem on a cubed-sphere grid with MPI parallelization.
http://wiki.geos-chem.org/GEOS-Chem_HP
Other
7 stars 13 forks source link

[BUG/ISSUE] Error during compile: /usr/bin/ld: cannot find -lmpi_cxx. What is libmpi_cxx.so? #1

Closed LiamBindle closed 6 years ago

LiamBindle commented 6 years ago

Hi,

I'm trying to compile GCHP in a singularity container that I made (similar to @JiaweiZhuang's singularity container but with Open MPI 2.1.2 instead of MPICH) and I am running into trouble during my "make compile_clean". Specifically, during compilation in <GEOS-Chem source code>/GCHP/ESMF/src/apps/ESMF_Info, I receive the following error:

$  CODE_DIR=<abs path to my CodeDir symlink>
$  mpif90  -fno-second-underscore -m64 -mcmodel=small -pthread -L$CODE_DIR/GCHP/ESMF/Linux/lib  -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/ -Wl,-rpath,$CODE_DIR/GCHP/ESMF/Linux/lib  -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5/ -o $CODE_DIR/GCHP/ESMF/Linux/bin/binO/Linux.gfortran.64.openmpi.default/ESMF_Info $CODE_DIR/GCHP/ESMF/obj/objO/Linux.gfortran.64.openmpi.default/src/apps/ESMF_Info/ESMF_Info.o -lesmf  -lmpi_cxx -lrt -lstdc++ -ldl
/usr/bin/ld: cannot find -lmpi_cxx
collect2: error: ld returned 1 exit status
My Open MPI install

I have installed Open MPI 2.1.2 to /usr/local (which is my $MPI_ROOT) and my /usr/local/lib has the following libraries:

$  ls /usr/local/lib
libmca_common_sm.la          libmpi_usempi.la          libopen-rte.la
libmca_common_sm.so          libmpi_usempi.so          libopen-rte.so
libmca_common_sm.so.20       libmpi_usempi.so.20       libopen-rte.so.20
libmca_common_sm.so.20.10.1  libmpi_usempi.so.20.10.1  libopen-rte.so.20.10.2
libmpi.la                    libompitrace.la           liboshmem.la
libmpi_mpifh.la              libompitrace.so           liboshmem.so
libmpi_mpifh.so              libompitrace.so.20        liboshmem.so.20
libmpi_mpifh.so.20           libompitrace.so.20.10.0   liboshmem.so.20.10.2
libmpi_mpifh.so.20.11.1      libopen-pal.la            mpi.mod
libmpi.so                    libopen-pal.so            openmpi
libmpi.so.20                 libopen-pal.so.20         pkgconfig
libmpi.so.20.10.2            libopen-pal.so.20.10.2
My Question

Do you know what libmpi_cxx.so is or where I should be able to find it? Is it simply an MPI runtime library that I could create a symlink to?

The reason I am posting this question to GCHP, rather than Open MPI, is because I noticed that changing $ESMF_COMM to the generic "mpi" changes the library in question to "libmpic++.so". This makes me think that there is some difference between what GCHP's Makefile is expecting, and what my system actually looks like.

I have attached my Singularity file for reference.

Thanks in advance,

Liam

LiamBindle commented 6 years ago

I believe that I figured it out.

The file <GCHP>/ESMF/build_config/*.*.default/build_rules.mk adds the -lmpi_cxx flag to both $ESMF_F90LINKLIBS and $ESMF_CXXLINKLIBS. The documentation (source: Building and Installing ESMF) for $ESMF_F90LINKLIBS and $ESMF_CXXLINKLIBS says

ESMF_F90LINKLIBS Possible value: list of libraries, each prepended with -l Prepend libraries to the list of libraries the ESMF build system determines.

To my eyes, it appears that <GCHP>/ESMF/build_config/<platform>.<compiler>.default/build_rules.mk is trying to define the MPI runtime as libmpi_cxx.so.

If I understand correctly, my MPI runtime library is libmpi.so as I got with the following:

$  mpif90 --showme:libs
mpi

This also makes sense since libmpi.so was lised in my /usr/local/lib directory (see my opening comment).


Solution

For my system the ESMF build_rules.mk file was <GCHP>/ESMF/build_config/Linux.gfortran.default/build_rules.mk. In this file, I simply changed -lmpi_cxx to -lmpi. After doing this GCHP compiled successfully.

Below is a diff of this file

@@ -68,12 +68,12 @@ ifeq ($(ESMF_COMM),openmpi)
 # OpenMPI --------------------------------------------------
 ESMF_CXXCOMPILECPPFLAGS+= -DESMF_NO_SIGUSR2
 ESMF_F90DEFAULT         = mpif90
-ESMF_F90LINKLIBS       += -lmpi_cxx
+ESMF_F90LINKLIBS       += -lmpi
 ESMF_CXXDEFAULT         = mpicxx
 # Need to change -lmpi_f77 to -lmpi_cxx to get ESMF to compile w/ OpenMPI
 # (ewl, 6/18/2018)
 #ESMF_CXXLINKLIBS       += -lmpi_f77
-ESMF_CXXLINKLIBS       += -lmpi_cxx
+ESMF_CXXLINKLIBS       += -lmpi
 ESMF_MPIRUNDEFAULT      = mpirun $(ESMF_MPILAUNCHOPTIONS)
 ESMF_MPIMPMDRUNDEFAULT  = mpiexec $(ESMF_MPILAUNCHOPTIONS)
 else
JiaweiZhuang commented 6 years ago

Thanks for reporting and glad that you have figured out the issue!

A side point: from your Singularity file, looks like your are installing gfortran 4.x. I found that old versions of gfortran cause run time errors (related to string processing in MAPL). gfortran 7.x works fine.

JiaweiZhuang commented 6 years ago

Looks like -lmpi_cxx is already removed in a recent commit (fdd8afe59c9c37dff70351887da87be8ad8ad629) by @lizziel

You are using an update-to-date version of GCHP code?

LiamBindle commented 6 years ago

Thanks for the tip! I did have a runtime error from MAPL when I launched GCHP (I don't remember what it said), but I left debugging that for tomorrow—I'll start with trying version 7.x!

Regarding my GCHP version, I am using the most up-to-date commit (205f019). It appears that fdd8afe updated $ESMF_CXXLINKLIBS and $ESMF_F90LINKLIBS for Linux.intel.default/build_rules.mk, but Linux.gfortran.default/build_rules.mk is the file that is used for my container.

https://github.com/geoschem/gchp/blob/205f019e978500bf238f8876dc3e79bafd38e880/ESMF/build_config/Linux.gfortran.default/build_rules.mk#L68-L80

Lines 71 and 76 were the ones that I changed to just -lmpi. I have only tried to compile GCHP with Open MPI 2.1.2, however, so I can't speak to the consistency of the library name (libmpi.so) across different OMPI versions and other MPI implementations.

Cheers,

Liam

lizziel commented 6 years ago

Hi Liam, Unfortunately those build_rules.mk files have a lot of hard-coded flags that may or may not work on different systems. Playing around with these may be necessary for different versions of OS, compiler, and MPI. I commit what is necessary to get things working for our testing but unfortunately there is no one-size-fits-all solution.

LiamBindle commented 6 years ago

Hi Lizzie,

Okay I see, that makes sense. I have got GCHP compiling now so I will close this issue.

Thanks,

Liam

JiaweiZhuang commented 5 years ago

Hit the same issue when trying gcc 7.3.0 + openmpi 3.1.3.

To be consistent with fdd8afe, I suggest removing -lmpi_cxx in ESMF/build_config/Linux.gfortran.default/build_rules.mk