StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
689 stars 144 forks source link

HTR build failure in CI #1756

Closed elliottslaughter closed 3 weeks ago

elliottslaughter commented 2 months ago

Reported by @mariodirenzo:

All my CIs based on the CMake build of Legion have been failing with errors like

/home/gitlab-runner/legion-debug-cmake/runtime/realm/gasnet1/gasnetmsg.cc:1665:
 undefined reference to `gasneti_thunk_tm'

The error appeared after https://gitlab.com/StanfordLegion/legion/-/merge_requests/1427 was merged.

Mario, can you provide any additional details on the specific CMake command line, and the system (OS, CUDA, CMake, etc. versions)?

elliottslaughter commented 2 months ago

@mariodirenzo Please retest this on latest master, we've merged various fixes to the CMake build that I think are likely to resolve this.

elliottslaughter commented 1 month ago

I've been told that CI is passing now, so closing. (If this is not true, feel free to reopen.)

mariodirenzo commented 1 month ago

Unfortunately, I haven't been able to get a build of Legion using CMake after https://gitlab.com/StanfordLegion/legion/-/merge_requests/1427 was merged. If the runtime is built properly, I get this error

[0 - 7fd0f5de7c40]    0.000000 {5}{gex}: Failed to load gex wrapper at librealm_gex_wrapper.so

when executing the code. Other configurations fail when compiling either my code or the runtime itself. I do not have any time to debug this until mid-October. I'll get in touch when I have further info

seemamirch commented 1 month ago

@mariodirenzo - if you can provides details to reproduce I can debug further. I don't see any cmake builds in the CI (the link I have been using for it - https://lc.llnl.gov/gitlab/stanford-psaap/ci/-/pipelines)

seemamirch commented 1 month ago

One config that fails with cmake

  1. Build failure with cmake + gasnet1 on lassen (different from what's reported above)
    • the same options work on sapling
    • the same options but without cmake work on lassen
    • the scripts and output from the builds (with/without cmake) are on sapling -> /scratch2/seemah/lassen_cmake_issue/
    • build.sh - build script for both, bad.txt is the build output with cmake, good.txt is without cmake
    • cmake version 3.23.1
muraj commented 1 month ago

@seemamirch I don't have access to the internal CI link you posted. Can you provide the logs of the issue? If you're seeing the following error, it is likely because you're building with the gasnetex wrapper enabled somehow, which requires either an environment variable set to be able to locate the wrapper library, or it needs to be available in a library search path (e.g. LD_LIBRARY_PATH).

[0 - 7fd0f5de7c40] 0.000000 {5}{gex}: Failed to load gex wrapper at librealm_gex_wrapper.so

If you do not want to use the gasnetex wrapper, then you need to disable it's use (by default I believe it is not used, so I believe you would be enabling it yourself or through another part of the build system, maybe legate or something?)

If you can provide the full log including the cmake command line used to configure Realm, and the build command and build output, I might be able to reproduce the issue and help resolve it for you. Thanks!

seemamirch commented 1 month ago

@seemamirch I don't have access to the internal CI link you posted. Can you provide the logs of the issue? If you're seeing the following error, it is likely because you're building with the gasnetex wrapper enabled somehow, which requires either an environment variable set to be able to locate the wrapper library, or it needs to be available in a library search path (e.g. LD_LIBRARY_PATH).

[0 - 7fd0f5de7c40] 0.000000 {5}{gex}: Failed to load gex wrapper at librealm_gex_wrapper.so

If you do not want to use the gasnetex wrapper, then you need to disable it's use (by default I believe it is not used, so I believe you would be enabling it yourself or through another part of the build system, maybe legate or something?)

If you can provide the full log including the cmake command line used to configure Realm, and the build command and build output, I might be able to reproduce the issue and help resolve it for you. Thanks!

I've moved the files to https://sapling2.stanford.edu/~seemah/lassen_cmake_issue/) so you can view them. my issue is without gasnetex The CI link is not relevant for this issue @mariodirenzo may benefit from your comments above (with gasnetex)

muraj commented 4 weeks ago

Okay, looking at "bad.tzt", I see the following line:

/usr/WS1/mirchandaney1/legion_rc/runtime/realm/realm_config.h:90:2: error: #error Shared memory not supported on GASNET1

error Shared memory not supported on GASNET1

This has to do with the shared memory support added several releases ago, and the logic to enable this is here in the cmake build system: https://github.com/StanfordLegion/legion/blob/4a03402467547b99530042cfe234ceec2cd31b2e/CMakeLists.txt#L822

Which, if gasnet1 is enabled, then this should be disabled. So either your configure command is enabling this feature and overriding the default, or something is wrong with the logic here.

mariodirenzo commented 3 weeks ago

As an update, I've been able to get a successful CI of HTR++ binding the latest master from scratch and setting GEX_BUILD_SHARED and Legion_USE_GASNETEX_WRAPPER to OFF. From my point of view, we can close the issue

seemamirch commented 3 weeks ago

It's possible there was a config issue with my cmake build on lassen - I can't reproduce it. Since @mariodirenzo is also ok with the cmake build of HTR++ - closing this issue