fortran-lang / fpm

Fortran Package Manager (fpm)
https://fpm.fortran-lang.org
MIT License
844 stars 95 forks source link

MPI-code can not be built with Intel compiler when using standard flags #1034

Open aradi opened 1 month ago

aradi commented 1 month ago

Description

The compiler flags used by fpm when building with the Intel oneAPI compiler prevent the successful build of MPI-parallelized code using IntelMPI.

To reproduce: Create a standard fpm project with

fpm new test

modify fpm.toml to contain

[fortran]
implicit-typing = true
implicit-external = true
source-form = "free"

[dependencies]
mpi = "*"

and modify the library source file as follows

module test
  use mpi_f08, only : MPI_COMM_WORLD, mpi_barrier, mpi_init, mpi_finalize
  implicit none
  private

  public :: say_hello
contains
  subroutine say_hello
    call mpi_init()
    call MPI_Barrier(MPI_COMM_WORLD)
    print *, "Hello, test!"
    call mpi_finalize()
  end subroutine say_hello
end module test

and build it with

FPM_FC=ifx fpm build

Using ifx (IFX) 2024.1.0 20240308 I obtain the error message:

ld: build/ifx_434F714D4D7C6A1A/test/libtest.a(src_test.f90.o): in function `test_MP_say_hello_':
/media/aradi/ramdisk/test/././src/test.f90:10:(.text+0x2c): undefined reference to `mpi_f08_compile_constants_MP_mpi_comm_world_'
ld: build/ifx_434F714D4D7C6A1A/test/libtest.a(src_test.f90.o):(.debug_info+0x3e): undefined reference to `mpi_f08_compile_constants_MP_mpi_comm_world_'

Expected Behaviour

It would be desirable, especially for less experienced programmers, that MPI-parallel code compiles with the oneAPI compiler automatically,

Version of fpm

0.10.1, alpha

Platform and Architecture

Linux

Additional Information

The culprit is the -standard-semantics option. Using this option changes the name mangling, and the IntelMPI fortran modules apparently use the "original" name mangling. Issuing

FPM_FC=ifx FPM_FFLAGS="-standard-semantics" fpm build

generates the same error. Using

FPM_FC=ifx FPM_FFLAGS="-stand f18" fpm build

instead leads to a successful build

perazz commented 1 month ago

Thanks @aradi for finding this issue. What is your output with fpm build --verbose?

All compilation flags are taken from the oneAPI mpi wrapper via mpiifort -show, I wonder how could that return flags that cannot produce a successful build?

https://github.com/fortran-lang/fpm/blob/88ebb0adec1566b324616f5adcd13a51359245a0/src/fpm_meta.f90#L1429

aradi commented 1 month ago

The relevant part of the output is following:

 + mkdir -p build/ifx_F21CDDED5FBAB7F2/app/
 + ifx    -warn all -check all,nouninit -error-limit 1 -O0 -g -assume byterecl -standard-semantics -traceback -I/opt/intel/oneapi/mpi/2021.12/include/mpi -I/opt/intel/oneapi/mpi/2021.12/include -I/opt/intel/oneapi/mpi/2021.12/include/mpi  -L/opt/intel/oneapi/mpi/2021.12/lib -L/opt/intel/oneapi/mpi/2021.12/lib -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /opt/intel/oneapi/mpi/2021.12/lib -Xlinker -rpath -Xlinker /opt/intel/oneapi/mpi/2021.12/lib -lmpifort -lmpi -ldl -lrt -lpthread build/ifx_343A946A5DEE85E7/test/app_main.f90.o build/ifx_434F714D4D7C6A1A/test/libtest.a -o build/ifx_F21CDDED5FBAB7F2/app/test
ld: build/ifx_434F714D4D7C6A1A/test/libtest.a(src_test.f90.o): in function `test_MP_say_hello_':
/home/aradi/ramdisk/test/././src/test.f90:10:(.text+0x2c): undefined reference to `mpi_f08_compile_constants_MP_mpi_comm_world_'
ld: build/ifx_434F714D4D7C6A1A/test/libtest.a(src_test.f90.o):(.debug_info+0x3e): undefined reference to `mpi_f08_compile_constants_MP_mpi_comm_world_'
[100%]                           test  done.

ld: build/ifx_434F714D4D7C6A1A/test/libtest.a(src_test.f90.o): in function `test_MP_say_hello_':
/home/aradi/ramdisk/test/././src/test.f90:10:(.text+0x2c): undefined reference to `mpi_f08_compile_constants_MP_mpi_comm_world_'
ld: build/ifx_434F714D4D7C6A1A/test/libtest.a(src_test.f90.o):(.debug_info+0x3e): undefined reference to `mpi_f08_compile_constants_MP_mpi_comm_world_'
<ERROR> Compilation failed for object " test "
<ERROR> stopping due to failed compilation
STOP 1

The problem is not the linking of the MPI libraries or any MPI related flag. The problem is the -standard-semantics flag, which I guess comes from fpm, maybe from the get_release_compile_flags() routine in fpm_compiler.f90.

As far as I understand, using -standard-semantics when compiling a source, changes how the module names, module procedure names and module variable names are mangled into linker symbols (watch for *_mp_* vs. *_MP_* in the generated symbolic names). Consequently, you can only link together module based libraries, where either all components were compiled with -standard-semantics or all without. Apparently, Intel decided to compile its mpi_f08 interface without that flag, so it can be only linked to Fortran sources, which do not use this flag either. (We had a lot of troubles with that flag earlier exactly for this reason in DFTB+, and dropped at some point).

I just double checked, the MPI_COMM_WORLD symbol is turned either into mpi_f08_compile_constants_mp_mpi_comm_world_ (without -standard-semantics) or into mpi_f08_compile_constants_MP_mpi_comm_world_ (with -standard-semantics). The Intel MPI library has apparently only former...

perazz commented 1 month ago

Thanks @aradi for the explanation.

Going back to when the -standard-semantics flag was added: https://github.com/fortran-lang/fpm/pull/901 , it was a necessary step to introduce the Intel compiler in the fpm ci because both fpm itself could not be compiled without it, and several of the simple tests were failing.

Right now I can't recall the page, but I believe it was suggested on the Fortran Discourse as a solution for it. (I'm myself not an Intel compiler user unfortunately).

So I guess there should be an Intel-specific check that removes that flag in case MPI is requested, which could be made part of the "Fortran language features" tree, although this is a bit of a "magic" flag and it's not really equivalent to any reference Standard versions? I'm thinking something like:

[fortran]
implicit-typing=true
intel-semantics=false

I know it's really not a Fortran language feature, but it's going to impact a lot of projects anyways

aradi commented 1 month ago

I'd first suggest to check, whether -standard-semantics is really needed at all. In earlier times, it was needed because otherwise automatic left-hand-side allocation did not work. But one could work around it by using -assume lhs_realloc instead. Latter does not change the name-mangling, and still ensures the correct automatic allocation. Actually, in recent ifx/ifort compilers -assume lhs_realloc is the default already, so no explicit arguments are needed for automatic LHS reallocation any more.

So, unless it is some other weird feature covered by -standard-semantics which is needed to build fpm, I'd rather suggest to use -assume lhs_realloc instead. The resulting object files could be then linked against IntelMPI, and we won't need any additional compiler-specific entry in fpm.toml.

aradi commented 1 month ago

Just as a side note, I've tried to build current fpm main (88ebb0adec15) with current ifx (2024.1.0 20240308):

FPM_FC=ifx fpm build --verbose

but got the error:

[ 68%]                      build.f90  done.

/tmp/ifx0048564392KIfruS/ifxbRdxCg.i90: error #6405: The same named entity from different modules and/or program units cannot be referenced.   [TOML_TABLE]
/tmp/ifx0048564392KIfruS/ifxbRdxCg.i90(246): catastrophic error: Too many errors, exiting
compilation aborted for ././src/fpm_settings.f90 (code 1)
/tmp/ifx20974836679VCRft/ifx1IfEb6.i90: error #6405: The same named entity from different modules and/or program units cannot be referenced.   [TOML_TABLE]
/tmp/ifx20974836679VCRft/ifx1IfEb6.i90(457): catastrophic error: Too many errors, exiting
compilation aborted for ././src/fpm/git.f90 (code 1)
/tmp/ifx0628665936seswGF/ifxBOsQlb.i90: error #6405: The same named entity from different modules and/or program units cannot be referenced.   [TOML_TABLE]
/tmp/ifx0628665936seswGF/ifxBOsQlb.i90(223): catastrophic error: Too many errors, exiting
compilation aborted for ././src/fpm/manifest/meta.f90 (code 1)
/tmp/ifx1612729881nXa6tf/ifxSni8sL.i90: error #6405: The same named entity from different modules and/or program units cannot be referenced.   [TOML_TABLE]
/tmp/ifx1612729881nXa6tf/ifxSni8sL.i90(329): catastrophic error: Too many errors, exiting
compilation aborted for ././src/fpm/manifest/build.f90 (code 1)
<ERROR> Compilation failed for object " src_fpm_settings.f90.o "
<ERROR> Compilation failed for object " src_fpm_git.f90.o "
<ERROR> Compilation failed for object " src_fpm_manifest_meta.f90.o "
<ERROR> Compilation failed for object " src_fpm_manifest_build.f90.o "
<ERROR> stopping due to failed compilation
STOP 1

I might have messed up something as I never built fpm before, so this is not an official bug report :smile: Also GFortran 14.1.1 works without problem. It could be easily an Intel compiler bug, we have various issues with 2024.1.0 in other projects...

perazz commented 1 month ago

We've also found that recently and turns out to be a bug in ifx, so currently, we never build fpm with the Intel compiler in the CI (we only use it to build fpm test packages)