cea-hpc / wi4mpi

Wrapper interface for MPI
BSD 3-Clause "New" or "Revised" License
80 stars 15 forks source link

Parallel build error fix #33

Closed spoutn1k closed 1 year ago

spoutn1k commented 1 year ago

Hey everyone,

This pull request is meant to allow safe parallel building of the project.

What happened

We encountered a few issues compiling Wi4MPI in the lab in a parallel environment. For a high number of jobs, compilation would systematically fail, but would succeed after retrying once or twice. The errors would look like such:

cmake --build . --parallel 96
[...]
CMake Error: failed to create symbolic link 'libmpi_mpifh.so': file already exists
CMake Error: cmake_symlink_library: System Error: File exists
CMake Error: failed to create symbolic link 'libmpicxx.so': file already exists
CMake Error: cmake_symlink_library: System Error: File exists
[...]
gmake: *** [Makefile:146: all] Error 2

Why it happened

Turns out CMake is having a hard time dealing with the multiple fake libraries built by Wi4MPI. Setting the SOVERSION flag in set_target_properties will trigger the creation of 'helper' symlinks to allow ease of access to the library.

https://github.com/cea-hpc/wi4mpi/blob/60bee2a51659a922aae6d0f101dd8a7261cc6fb2/src/CMakeLists.txt#L469-L474

From https://cmake.org/cmake/help/latest/prop_tgt/SOVERSION.html:

For shared libraries VERSION and SOVERSION can be used to specify the build version and API version respectively. When building or installing appropriate symlinks are created if the platform supports symlinks and the linker supports so-names

The created symlinks omit the major:

lrwxrwxrwx.  1 jskutnik prl_collab   15 Sep 21 10:42 libmpi_cxx.so -> libmpi_cxx.so.5
-rwxr-xr-x.  1 jskutnik prl_collab 7.8K Sep 21 10:42 libmpi_cxx.so.5

For single version library creations, that is fine, but for this specific use case, the symlinks conflict with each other, as multiple links with the same name will be created at the same time.

What this changes

I introduced the SOVERSION flag into the project, as it influences the ELF header of the resulting files by setting the version number in it. One may remove it to make this go away, but this will result in inaccurate ELF headers.

This pull request introduces build directories based on the version number, using the LIBRARY_OUTPUT_DIRECTORY target property. The install directive then reflects this path change, but installs the file at the same location as usual.

set_target_properties(${LIB}${MAJOR} PROPERTIES
            LIBRARY_OUTPUT_NAME ${LIB}
            LIBRARY_OUTPUT_DIRECTORY MAJOR_${MAJOR}
            SOVERSION ${MAJOR})
install(FILES ${CMAKE_BINARY_DIR}/src/MAJOR_${MAJOR}/lib${LIB}.so.${MAJOR}
            DESTINATION libexec/wi4mpi/fakelibCXX
            RENAME lib${LIB}.so.${MAJOR}
            PERMISSIONS WORLD_READ WORLD_EXECUTE
                OWNER_READ OWNER_EXECUTE OWNER_WRITE
                GROUP_WRITE GROUP_READ GROUP_EXECUTE
        )
adrien-cotte commented 1 year ago

Amazing job, this was in my "backlog that I have no time to deal with". I can't check before next week, but maybe @Clement-Fontenaille or @marcjoos-cea are able to approve this PR.

Thanks a lot, carry on!

spoutn1k commented 1 year ago

No worries. I felt a little bit responsible for this one, as I introduced the cause and did not realize what it broke. Hopefully this fix helps !

Clement-Fontenaille commented 1 year ago

This works like a charm. Thanks for your contribution !