LLNL / Caliper

Caliper is an instrumentation and performance profiling library
http://software.llnl.gov/Caliper/
BSD 3-Clause "New" or "Revised" License
343 stars 63 forks source link

undefined reference to `PMPI_Accumulate_c' when using Intel oneAPI 2024.1 #548

Open adam-sim-dev opened 5 months ago

adam-sim-dev commented 5 months ago

cmake -S . -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DWITH_MPI=On -DMPI_C_COMPILER=mpiicx -DMPI_CXX_COMPILER=mpiicpx -DBUILD_SHARED_LIBS=Off -DWITH_GOTCHA=Off -DCMAKE_INSTALL_PREFIX=./caliper -DCMAKE_INSTALL_LIBDIR=lib

cmake.log make.log

adam-sim-dev commented 5 months ago

There is no problem when I build Caliper using Intel oneAPI 2024.0. With oneAPI 2024.1, I can see, in /opt/intel/oneapi/mpi/latest/include/mpi.h, there is only

int MPI_Accumulate_c(const void *origin_addr, MPI_Count origin_count, MPI_Datatype origin_datatype,
                     int target_rank, MPI_Aint target_disp, MPI_Count target_count,
                     MPI_Datatype target_datatype, MPI_Op op, MPI_Win win)
                     MPICH_ATTR_POINTER_WITH_TYPE_TAG(1,3) MPICH_API_PUBLIC;
daboehme commented 5 months ago

Hi @adam-sim-dev, interesting, thanks for the report! What system are you on?

adam-sim-dev commented 5 months ago

Hi @adam-sim-dev, interesting, thanks for the report! What system are you on?

Ubuntu 22.04

daboehme commented 5 months ago

Okay, thanks. It sounds like a bug in oneAPI since every MPI function is supposed to have a corresponding PMPI function, but I'll see if I can work around it.

adam-sim-dev commented 2 months ago

This issue still exists for oneAPI 2024.2. If it is a bug in oneAPI, can we report it to Intel? (Sorry, I do not know what the specific problem is.) It will be good if there is a workaround in Caliper.

adam-sim-dev commented 2 weeks ago

Any progress on this issue? @daboehme

daboehme commented 2 weeks ago

Hi @adam-sim-dev, apologies for not getting back to this earlier. I do think it's an issue with oneAPI and it would be good to report it. Every MPI function should have an equivalent PMPI function but apparently they forgot to add one for MPI_Accumulate_c.

Any particular reason you're disabling Gotcha? The PMPI_ issue won't happen if you use Gotcha for wrapping MPI functions. Gotcha used to have some issues in particular with the Intel software stack, but there were several improvements in the latest versions that should fix these. Might be worth giving it a try again. Requires you to link MPI as a shared library though.

Generally a fix will probably require a manual workaround. If you're feeling adventurous you can hack src/services/mpiwrap/wrap.py and add MPI_Accumulate_c to the exclude_strings list.

adam-sim-dev commented 2 weeks ago

Any particular reason you're disabling Gotcha? The PMPI_ issue won't happen if you use Gotcha for wrapping MPI functions. Gotcha used to have some issues in particular with the Intel software stack, but there were several improvements in the latest versions that should fix these. Might be worth giving it a try again. Requires you to link MPI as a shared library though.

I can not remember why I set -DWITH_GOTCHA=Off, but I will have a try.