Open geimer opened 4 years ago
@geimer I'm looking into this again now from the position that flang-new
is at least good enough to get started on.
Looking at your first point, how about we do everything except the front end compilers Clang and Flang in an LLVM
build, and then couple these two into a new LLVM-compilers
as a toolchain. I would hope that building the frontends based on an existing installation should be do-able, looking at their docs:
cmake -DLLVM_ENABLE_PROJECTS=clang -DCMAKE_BUILD_TYPE=Release -G "Unix Makefiles" ../llvm
make
Note: For subsequent Clang development, you can just run make clang.
so I wonder it it might get away with just not running the make
step and jumping straight to make clang
.
As regards OpenMP, we could disable the creation of the symlinks in the LLVM
build, and create the appropriate symlink with the LLVM-compilers
build (which I guess would be just the libgomp
one). For libstdc++
, we will likely have to make this the default...not sure what the performance implications of that would be.
Now that I see all that written down, I wonder if it doesn't just look like your second solution, just with LLVM
and LLVM-compilers
instead of LLVM-Clang
and Clang
. Indeed, is it really a problem that LLVM
ships the compilers? They only have meaning for us when used in a toolchain. We can use LLVM-compilers
to restore the OpenMP symlink based on whether we want the Intel or GCC versions.
Hmm, the issue with OpenMP and GCCcore is actually really a general one, we can probably side-step that whole thing by making libgomp a banned library when using GCCcore (https://github.com/easybuilders/easybuild-framework/issues/4535)
Initial indications are that we should be able to separate out clang
/flang
as @Crivella has found that their make files indicate support for this:
# If we are not building as a part of LLVM, build Clang as an
# standalone project, using LLVM as an external library:
if(CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR)
project(Clang)
set(CLANG_BUILT_STANDALONE TRUE)
endif()
# Check for a standalone build and configure as appropriate from
# there.
if (CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR)
message("Building Flang as a standalone project.")
project(Flang)
set(FLANG_STANDALONE_BUILD ON)
else()
set(FLANG_STANDALONE_BUILD OFF)
endif()
@geimer I'm looking into this again now from the position that
flang-new
is at least good enough to get started on.
I agree that the Fortran compiler is slowly approaching a usable state. With LLVM 18, there are still very simple OpenMP tests which fail to compile, but those are fixed in the current trunk version. This means that LLVM 19 might be a version where one can realistically try to build a toolchain.
Looking at your first point, how about we do everything except the front end compilers Clang and Flang in an
LLVM
build, and then couple these two into a newLLVM-compilers
as a toolchain. I would hope that building the frontends based on an existing installation should be do-able, looking at their docs:
Just to understand correctly, the idea would be to build OpenMP and so on with GCC first and afterwards build the Clang & Flang with this LLVM? We could do this, but should only build the minimal set of things we need for this to work or else users may miss features. The most important one I can think of right now would be support for OpenMP offloading, which may not work if OpenMP is built with GCC.
See (source):
Note: The compiler that generates the offload code should be the same (version) as the compiler that builds the OpenMP device runtimes. The OpenMP host runtime can be built by a different compiler.
As regards OpenMP, we could disable the creation of the symlinks in the
LLVM
build, and create the appropriate symlink with theLLVM-compilers
build (which I guess would be just thelibgomp
one). Forlibstdc++
, we will likely have to make this the default...not sure what the performance implications of that would be.
We would not only need to take libstdc++
into consideration but also libc
. Both are available in the LLVM repo and libc
might get more important in the future with continued work on the offloading infrastructure. As for the symlink, I would advocate against only offering libgomp
as this may leave out the LLVM OpenMP runtime, which means no offloading and no OpenMP Tools Interface. The last one would be an issue for performance tools like Score-P. We need to make sure that -fopenmp
/-fopenmp=libomp
work correctly.
See this example with Clang/trunk:
Maybe it would be not unhelpful if I provide an atypical(?) perspective for why I'm interested in a[1] Flang/Clang toolchain, in fact why I'm interested even if the compiler is not bug-free.
Since I'm developing and maintaining a Fortran application, and one which also offloads to GPU, I want to be able to test the compilers (also without offloading), and find out if I need to i) workaround compiler issues, or ii) ask a vendor to prioritize a feature for us. At the moment, it's kind of tricky to get an environment in which I can test the code, so I can't tell e.g. AMD if I can even get ready to start trying to port the offloading (e.g. if the compiler can compile our code on CPU). This environment for testing is what interests me in a Flang/Clang toolchain. For this, I need at least Compiler, MPI, HDF5, LAPACK. I think everything else I could deal with later.
[1] actually, I'd like to have ~3 toolchains, but that's maybe getting a bit far ahead of ourselves: upstream flang-new, upstream flang-classic, and vendored (e.g.) amd-flang
Maybe it would be not unhelpful if I provide an atypical(?) perspective for why I'm interested in a[1] Flang/Clang toolchain, in fact why I'm interested even if the compiler is not bug-free.
I'm absolutely with you on this one. The Fortran compiler is getting mature enough that building a toolchain is feasible and, even if not entirely bug-free, might have a large interest for users.
Since I'm developing and maintaining a Fortran application, and one which also offloads to GPU, I want to be able to test the compilers (also without offloading), and find out if I need to i) workaround compiler issues, or ii) ask a vendor to prioritize a feature for us.
While I agree that this is a very interesting scenario, having an up-to-date latest and greatest version available all the time might be difficult, especially if a whole toolchain is built with that compiler. That's one reason why I chose to build a very small toolchain manually (basically only LLVM/Clang + OpenMPI), even if it's more painful to do so. This allows me to test a daily Clang (and sometimes also AOMP) build for issues.
[1] actually, I'd like to have ~3 toolchains, but that's maybe getting a bit far ahead of ourselves:
upstream flang-new, upstream flang-classic, and vendored (e.g.) amd-flang
I would guess that flang-classic will fade out once flang-new is ready. Building amdflang is more complicated though, as you would probably also want to have their entire LLVM toolchain. At that point, you're basically building AOMP (or the equivalent ROCm components) from source. That's possible, but as you said, we should focus on LLVM/Clang first.
I'm absolutely with you on this one. The Fortran compiler is getting mature enough that building a toolchain is feasible and, even if not entirely bug-free, might have a large interest for users.
I think this is the general idea here, we move ahead in the C/C++ space and see where things break with Fortran, as things improve we will get more and more Fortran applications building. I'm particularly interested in pushing some Fortran applications we are connected to to start working on OpenMP device offloading.
While I agree that this is a very interesting scenario, having an up-to-date latest and greatest version available all the time might be difficult, especially if a whole toolchain is built with that compiler. That's one reason why I chose to build a very small toolchain manually (basically only LLVM/Clang + OpenMPI), even if it's more painful to do so. This allows me to test a daily Clang (and sometimes also AOMP) build for issues.
So, there is a subtle issue here. EasyBuild itself updates toolchains twice a year (e.g., 2023a
, 2023b
). That means that you can expect to find a reasonable amount of software for those releases and we would try harder to support them. That doesn't mean we couldn't have additional releases of Clang and an associated toolchain (these are usually versioned 2023.11
, etc.), just that only 2 per year would become a "supported" toolchain. How you can work with this depends on your use case. If you are not bumping your dependency versions (outside of a EB toolchain release) then --try-toolchain
is your friend. Of course the benefit of working together is we can continuously improve the relevant easyblock to ensure we are always current with our support and making good choices. This won't cover daily builds, but at least the easyblock will work out of the box.
[1] actually, I'd like to have ~3 toolchains, but that's maybe getting a bit far ahead of ourselves: upstream flang-new, upstream flang-classic, and vendored (e.g.) amd-flang
I would guess that flang-classic will fade out once flang-new is ready. Building amdflang is more complicated though, as you would probably also want to have their entire LLVM toolchain. At that point, you're basically building AOMP (or the equivalent ROCm components) from source. That's possible, but as you said, we should focus on LLVM/Clang first.
I don't think we would be considering flang-classic
, I'd prefer to look forward given the offloading support. AOMP is going to happen, there is a WIP in progress PR and we need this anyway for ROCm.
For this, I need at least Compiler, MPI, HDF5, LAPACK. I think everything else I could deal with later.
I've been working on this (still on making sure all components of llvm-project
works properly also with a bootstrapped build to remove GCC dependencies).
With a simple build of flang-new
ii did try to compile QuantumESPRESSO but it did fail on the FFTXlib
due to a lack of support for polymorphism.
Will have to try with HDF5 and LAPACK
So, there is a subtle issue here. EasyBuild itself updates toolchains twice a year (e.g.,
2023a
,2023b
). That means that you can expect to find a reasonable amount of software for those releases and we would try harder to support them. That doesn't mean we couldn't have additional releases of Clang and an associated toolchain (these are usually versioned2023.11
, etc.), just that only 2 per year would become a "supported" toolchain. How you can work with this depends on your use case. If you are not bumping your dependency versions (outside of a EB toolchain release) then--try-toolchain
is your friend. Of course the benefit of working together is we can continuously improve the relevant easyblock to ensure we are always current with our support and making good choices. This won't cover daily builds, but at least the easyblock will work out of the box.
Looking at the current EasyConfigs, there seems to be one version per toolchain update (with Clang 16 being the exception), normally using the last release in the LLVM/Clang update cycle. From my perspective, this is sufficient for EasyBuild and its users.
Like you've said, additional versions can easily be added and if a major version is supported already (e.g. 18.1.0), the chance is high that another version (e.g. 18.1.7) also works fine when passed via --try-toolchain
.
I've been working on this (still on making sure all components of
llvm-project
works properly also with a bootstrapped build to remove GCC dependencies).
That sounds great! There are certainly some quirks that can come up. Just as an example: We're developing an LLVM IR plug-in for our application that will be used as an additional pass when a user compiles his application. There, we want to use llvm::demangle
. However, the Clang installation on our HPC system fails to link llvm::demangle
even though LLVMDemangle.a
is linked. One needs to link libclang.so
, which is not provided by llvm-config
.
To have a point of reference i've just opened 2 PRs for EB and related EC files
there are still some things that need fixing/improving, but in the meanwhile suggestion/comments are welcome
We should see LLVM 19 in September: https://discourse.llvm.org/t/llvm-19-release-schedule-and-planning/79828
Disclaimer: I'm by no means an LLVM or Clang expert. The information below is just a collection of bits and pieces found in various places as well as my personal thoughts on how EasyBuild support could be improved.
Target
A working LLVM-based toolchain -- at least for C/C++ -- with minimal redundancy. Here, "toolchain" is not meant in the EasyBuild sense (i.e., including an MPI, math libs, etc.), but merely refers to a compiler environment that can be used by end users to build their codes. (This doesn't rule out to have an MPI w/o Fortran support using
Clang
, though.)With proper Fortran support being on the horizon, however, it might become a full toolchain in the EasyBuild sense in the future. This should be taken into account in the design.
Status quo
LLVM / Clang / flang
LLVM provides a framework for code optimization and generation for many different target CPUs. The most prominent language frontend is
Clang
, which focuses on C-like languages (C, C++, Objective-C, OpenCL). Basically all commercial compiler vendors (Intel, PGI, Cray, IBM, Fujitsu, ARM) have switched in the meanwhile toClang
as the basis for their C/C++ compilers.Fortran support was started based on the PGI Fortran compiler frontend, see the flang project on GitHub, now called "old/legacy/classic flang". However, it requires patched versions of LLVM and Clang, and seems stuck at LLVM 9. However, this mailing list post suggests that there might be an update for LLVM 11 ("LLVM11 with classic flang is on various vendor's roadmap for this autumn, so one of us will do it I'm sure.")
Besides, there is a "new flang" frontend (formerly called
f18
) written from scratch, now developed as an official LLVM project. However, it isn't fully functional yet and still depends on another compiler to do the actual work, see this mailing list post.EasyBuild
EasyBuild currently includes various
LLVM
packages which are used as dependencies by, for example,Mesa
,numba
, andRust
. Recent versions are built on top ofGCCcore
, and only include the core LLVM libraries and tools.In addition, there are various
Clang
easyconfigs. Again, recent versions are (usually) built on top ofGCCcore
. These can be used as a stand-alone compiler, but are also used as dependencies by various packages, such aspocl
,TRIQS
, andLongshot
, and could be used by additional packages such asScore-P
andDoxygen
. This is due to also providing libraries for source-code parsing and processing. TheClang
packages build their own copy ofLLVM
, and include other LLVM projects such as an OpenMP runtime library, thelld
linker, thelibc++
C++ Standard Library, and thepolly
polyhedral optimizer, though not all of those components are used by default with the current configuration.There has been some work on packaging "legacy flang" (see https://github.com/easybuilders/easybuild-easyconfigs/pull/8335 and https://github.com/easybuilders/easybuild-easyblocks/pull/1729), however, the question is whether it is worth putting more effort into this since things might change considerably with the "new flang".
Possible ways to organize things in EasyBuild
Build full
Clang
(includinglld
, libraries, etc.) using an existingLLVM
built withGCCcore
as dependencyCons:
libgomp
andlibiomp5
, i.e., the OpenMP runtimes of the GCC and Intel compilers, as it implements both APIs. Thus, the order in which modules are loaded determines which runtime is found byld.so
and affects the runtime behavior of codes using OpenMP.Creating these symlinks can be disabled via a
CMake
configuration option, but doing so may lead to simultaneously using two different OpenMP runtimes if some OpenMP code compiled withClang
is linked to a library built withGCCcore
also using OpenMP.libc++
by default forClang
is likely to make code incompatible with C++ libraries compiled withGCCcore
usinglibstdc++
.LLVM
rather thanClang
.Introduce a new package named, e.g.,
LLVM-Clang
built withGCCcore
providing a fullClang
(includinglld
, libraries, etc.) and use it as a dependency for all packages that currently depend on eitherLLVM
orClang
. AClang
compiler package would then be a bundle ofGCCcore
,LLVM-Clang
, andbinutils
.libc++
issues outlined aboveLLVM-Clang
vs.Clang
packaging would probably cause questions similar to theGCCcore
vs.GCC
separation.Build minimal
Clang
(excludinglld
, libraries, OpenMP runtime) on top ofGCCcore
-- either using an existingLLVM
or as part of aLLVM-Clang
package as outlined above -- to provide the Clang libraries to packages that need it as a dependency. In addition, build a fullLLVM
/Clang
(including everything) on theSYSTEM
level as a separate toolchain.GCCcore
, as it is a completely separate toolchain.Clang
on theSYSTEM
level. (How does one properly do this? UseGCC
as a builddep rather than toolchain???)gfortran
could (temporarily) serve as a Fortran compiler in a full LLVM toolchain using, e.g.,compiler-rt
instead oflibgcc_s
. It's very likely that this won't work.Clang
module underGCCcore
serves a very limited purpose and should thus be avoided by end-users, unless they really know what they are doing. Not sure how to best prevent/document this. It is also unclear whether such a stripped downClang
would be sufficient for all packages that currently depend on the existingClang
packages.