conda / conda-build

Commands and tools for building conda packages
https://docs.conda.io/projects/conda-build/
Other
375 stars 418 forks source link

Is cmake sysroot issue still a thing with modern cmake (>=3)? #5399

Open link89 opened 1 month ago

link89 commented 1 month ago

Checklist

What happened?

I am building a c++ module under conda environment with cmake. The compile steps works well but when I run the binary I get the following segmentation fault.

[cu390:52361] *** Process received signal ***
[cu390:52361] Signal: Segmentation fault (11)
[cu390:52361] Signal code:  (-6)
[cu390:52361] Failing at address: 0x7d870000cc89
[cu390:52361] [ 0] /lib64/libpthread.so.0(+0xf6d0)[0x2ba447ba96d0]
[cu390:52361] [ 1] /public/groups/ai4ec/libs/conda/deepmd/3.0.0b0-cuda12/lib/python3.11/site-packages/tensorflow/../../../libabsl_flags_reflection.so.2401.0.0(_ZN4absl12lts_2024011614flags_internal12FlagRegistry12RegisterFlagERNS0_15CommandLineFlagEPKc+0x99)[0x2ba463da6e09]
[cu390:52361] [ 2] /public/groups/ai4ec/libs/conda/deepmd/3.0.0b0-cuda12/lib/python3.11/site-packages/tensorflow/../../../libabsl_flags_reflection.so.2401.0.0(_ZN4absl12lts_2024011614flags_internal23RegisterCommandLineFlagERNS0_15CommandLineFlagEPKc+0x21)[0x2ba463da85c1]
[cu390:52361] [ 3] /public/groups/ai4ec/libs/conda/deepmd/3.0.0b0-cuda12/lib/python3.11/site-packages/tensorflow/../../../libabsl_log_flags.so.2401.0.0(+0x3079)[0x2ba463d86079]
[cu390:52361] [ 4] /lib64/ld-linux-x86-64.so.2(+0xfb03)[0x2ba444c3fb03]
[cu390:52361] [ 5] /lib64/ld-linux-x86-64.so.2(+0x146de)[0x2ba444c446de]
[cu390:52361] [ 6] /lib64/ld-linux-x86-64.so.2(+0xf914)[0x2ba444c3f914]
[cu390:52361] [ 7] /lib64/ld-linux-x86-64.so.2(+0x13ccb)[0x2ba444c43ccb]
[cu390:52361] [ 8] /lib64/libdl.so.2(+0xfbb)[0x2ba447890fbb]
[cu390:52361] [ 9] /lib64/ld-linux-x86-64.so.2(+0xf914)[0x2ba444c3f914]
[cu390:52361] [10] /lib64/libdl.so.2(+0x15bd)[0x2ba4478915bd]
[cu390:52361] [11] /lib64/libdl.so.2(dlopen+0x31)[0x2ba447891051]
[cu390:52361] [12] /public/groups/ai4ec/libs/conda/deepmd/3.0.0b0-cuda12/opt/lammps/lib/liblammps.so.0(_ZN9LAMMPS_NS11plugin_loadEPKcPNS_6LAMMPSE+0x82)[0x2ba4458f1812]
[cu390:52361] [13] /public/groups/ai4ec/libs/conda/deepmd/3.0.0b0-cuda12/opt/lammps/lib/liblammps.so.0(_ZN9LAMMPS_NS16plugin_auto_loadEPNS_6LAMMPSE+0x153)[0x2ba4458f1cd3]
[cu390:52361] [14] /public/groups/ai4ec/libs/conda/deepmd/3.0.0b0-cuda12/opt/lammps/lib/liblammps.so.0(_ZN9LAMMPS_NS6LAMMPSC2EiPPcP19ompi_communicator_t+0xdde)[0x2ba44561d8de]
[cu390:52361] [15] lmp(main+0x47)[0x401117]
[cu390:52361] [16] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2ba446c57445]
[cu390:52361] [17] lmp[0x401198]
[cu390:52361] *** End of error message ***

I try to find out what is happening and guess it may be the issue of the system root. Then I find the following document: https://docs.conda.io/projects/conda-build/en/stable/resources/compiler-tools.html#an-aside-on-cmake-and-sysroots

An aside on CMake and sysroots Anaconda's compilers for Linux are built with something called crosstool-ng. They include not only GCC, but also a "sysroot" with glibc, as well as the rest of the toolchain (binutils). Ordinarily, the sysroot is something that your system provides, and it is what establishes the libc compatibility bound for your compiled code. Any compilation that uses a sysroot other than the system sysroot is said to be "cross-compiling." When the target OS and the build OS are the same, it is called a "pseudo-cross-compiler." This is the case for normal builds with Anaconda's compilers on Linux.

Unfortunately, some software tools do not handle sysroots in intuitive ways. CMake is especially bad for this. Even though the compiler itself understands its own sysroot, CMake insists on ignoring that. We've filed issues at:

https://gitlab.kitware.com/cmake/cmake/issues/17483

Additionally, this Stack Overflow issue has some more information: https://stackoverflow.com/questions/36195791/cmake-missing-sysroot-when-cross-compiling

In order to teach CMake about the sysroot, you must do additional work. As an example, please see our recipe for libnetcdf at AnacondaRecipes/libnetcdf-feedstock

In particular, you'll need to copy the cross-linux.cmake file there, and reference it in your build.sh file:

CMAKE_PLATFORM_FLAGS+=(-DCMAKE_TOOLCHAIN_FILE="${RECIPE_DIR}/cross-linux.cmake")

cmake -DCMAKE_INSTALL_PREFIX=${PREFIX} \ ${CMAKE_PLATFORM_FLAGS[@]} \ ${SRC_DIR}

Turning out the reference issues is quite old I am doubting there is a easy way to fix this issue. I am not expert of compiler so I am not sure if CMAKE_SYSROOT can be used to solve this problem, or do I still need to following the complex step to work around it?

Additional Context

No response

travishathaway commented 1 month ago

@link89,

I transferred this issue to the conda-build project because it is related to its documentation.