chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 421 forks source link

Unable to build Chapel 1.25.0 under Gentoo Linux with CHPL_LLVM=system #18589

Closed preney closed 3 years ago

preney commented 3 years ago

Summary of Problem

When using these steps to compile Chapel v1.25.0 everything compiles correctly:

    wget https://github.com/chapel-lang/chapel/releases/download/1.25.0/chapel-1.25.0.tar.gz
    tar xvf chapel-1.25.0.tar.gz
    cd chapel-1.25.0
    export CHPL_LLVM=bundled
    source util/setchplenv.bash
    make

however when using these steps to use the system-installed LLVM/Clang:

    wget https://github.com/chapel-lang/chapel/releases/download/1.25.0/chapel-1.25.0.tar.gz
    tar xvf chapel-1.25.0.tar.gz
    cd chapel-1.25.0
    export CHPL_LLVM=system
    export CHPL_LLVM_CONFIG=/usr/lib/llvm/11/bin/llvm-config
    export CHPL_HOST_COMPILER=llvm
    export CHPL_TARGET_COMPILER=llvm
    source util/setchplenv.bash
    make

compilation fails with these errors:

    ld.lld: error: unable to find library -lclangFrontend
    ld.lld: error: unable to find library -lclangSerialization
    ld.lld: error: unable to find library -lclangDriver
    ld.lld: error: unable to find library -lclangCodeGen
    ld.lld: error: unable to find library -lclangParse
    ld.lld: error: unable to find library -lclangSema
    ld.lld: error: unable to find library -lclangAnalysis
    ld.lld: error: unable to find library -lclangEdit
    ld.lld: error: unable to find library -lclangASTMatchers
    ld.lld: error: unable to find library -lclangAST
    ld.lld: error: unable to find library -lclangLex
    ld.lld: error: unable to find library -lclangBasic
    clang-11: error: linker command failed with exit code 1 (use -v to see invocation)

Since these libclang libraries are not on my system I submitted a bug report to Gentoo Linux here:

and the response was to use -lclang-cpp. (Per https://archives.gentoo.org/gentoo-dev/message/6d3cf88d858fcf3fbb11818ce5d6ea42 Gentoo as of Clang v10 was no longer using BUILD_SHARED_LIBS=ON when building Clang and instead uses the 'dylib' model).

I grep'd the chapel source code for these libraries and they are mentioned in third-party/llvm/Makefile.include-system so I changed the file so that:

    LLVM_CLANG_LIBS=-lclang-cpp

was set and I still get linker errors. After experimentation, starting with:

    export CHPL_LLVM=system
    export CHPL_LLVM_CONFIG=/usr/lib/llvm/11/bin/llvm-config
    export CHPL_HOST_COMPILER=llvm
    export CHPL_TARGET_COMPILER=llvm
    export CHPL_LLVM_DYNAMIC=1
    export CHPL_LIB_PIC=pic

and then editing third-party/llvm/Makefile.include-system so the following are set:

    LLVM_CLANG_LIBS=-lclang-cpp
    # Add -fPIC...
    LLVM_CXXFLAGS=-fPIC $(LLVM_CONFIG_CXXFLAGS) $(LLVM_MY_CXXFLAGS) -DHAVE_LLVM
    LLVM_CFLAGS=-fPIC $(LLVM_CONFIG_CFLAGS) -DHAVE_LLVM
    # Add -dynamic...
    LLVM_LIBS=-dynamic -shared -L$(LLVM_CONFIG_LIB_DIR) $(LLVM_CLANG_LIBS) $(LLVM_LLVM_LIBS)

and then running source util/setchplenv.bash followed by make yields a successful compile --but when chpl is run (including with no arguments) it immediately segfaults, e.g.,

    $ chpl
    Segmentation fault (core dumped)

What behavior did you observe when encountering this issue?

What behavior did you expect to observe?

Is this a blocking issue with no known work-arounds?

There are no instructions given in Chapel concerning this, but, my system is running LLVM/Clang is v11.1.0 --not v11.0.1 as stated in third-party/llvm/README. The Chapel docs say LLVM/Clang v11 --which I am running.

I suppose it is possible that when system LLVM is being used, the Chapel build might be incorrectly using source code from the bundled code --not the system (e.g., via header files). If so then this could be causing issues. Since the organization of code in third-party/llvm doesn't appear to match pristine LLVM/Clang code, I did not replace any LLVM/Clang v11.0.1 code with the same from v11.1.0. (Additionally, there is no documentation concerning how to graft such in or how to remove the all bundled LLVM/Clang 11.0.1 code (since it shouldn't be used at all) from third-party/llvm --so I left such alone.)

In summary, the problem appears to be that the build code for Chapel assume the ability to statically link to LLVM/Clang libraries when using a system-installed LLVM/Clang. On its own CHPL_LLVM_DYNAMIC doesn't fix any issues: it fails to link (even if the -lclang-cpp edits are made as detailed above). One has to use -fPIC and -dynamic -shared as discussed above to get things to compile and link (but it segfaults when run). How might this be fixed so Chapel can be built using the system's LLVM/Clang v11? Thanks.

(ASIDE: All Chapel prerequisites are installed in the system.)

mppf commented 3 years ago

hi @preney - thanks very much for this report. It is not too surprising to see packaging/make level issues with LLVM here and it is an area we are working on improving. The first thing I'd like to do is to reproduce your environment in order to try to fix it. But, I am completely unfamiliar with Gentoo. Could you help me out by sharing the commands you ran to install the dependencies? Thanks.

preney commented 3 years ago

Hi @mppf ! My apologies for the delay in response... but with this post you will infer, I've been busy creating an straight-forward solution so you could have an installation of Gentoo Linux to work with ... and the same could be used to have other Linux distributions (including multiple versions) installed for Chapel testing purposes as well + such can all be scripted. :-)

I apologize as this post is long (and this isn't really a normal issue/bug post at all). I have created a solution involving a Bash shell script and a Singularity (i.e., https://singularity.hpcng.org/ https://github.com/hpcng/singularity or commercially https://sylabs.io/singularity/ ) definition file (which is a "script" that creates a Singularity container). Singularity is (Linux) container technology that is ideal for high-performance computing (HPC) environments since it is also secure and using a container can completely occur in user space without any daemons, etc. (unlike Docker). (ASIDE: My day job is involves supporting Canadian academic research via SHARCNET/Compute Canada (e.g., https://www.sharcnet.ca and https://www.computecanada.ca/ ).)

While some readers of this might know about and use Singularity others might not, so in this post I provide hopefully sufficient information to help with the installation of Singularity. Before discussing such, however, it is important to answer this question, "Why should one even consider using Singularity?" Answer: Reproducibility. Every system is an amalgam of all kinds of software, compiled with all kinds of configuration settings, etc. and while we hope some low probability random thing isn't a contributing factor --it might be. LLVM/Clang is a complicated piece of software with all kinds of options that takes approximately 30 minutes to compile on a 16-core AMD Ryzen Threadripper PRO 3955WX system. While we hope various settings that LLVM/Clang can be compiled with aren't an issue, it is unrealistic to say those will never be significant problems. Having reproducible containers will allow us to produce exactly the same containers + discuss/explore/resolve those issues in a sane manner.

ASIDE: Gentoo Linux builds everything from source. That said, an initial system starts of will a stage3 (binary) tarball --which the Singularity .def file mentioned below downloads and puts into the container. (One can tell Gentoo's package manager to rebuild such if one wants --I did not do this in the files mentioned below.) Because Gentoo compiles things from source and is all about configurability, reproducibility is important to have in this instance with the end goal of hopefully being able to make Chapel's build system more robust to underlying system configurations. The latter is important to take advantage of things done on systems. For example, HPC compute cluster systems will have all kinds of tools custom built to take advantage of installed hardware, fabrics, etc. to better performance. Software tools that can make use of such typically get those advantages "for free" if they can use those already installed tools.

To enable reproducibility and discussions (e.g., to deal with this Github issue), I created two files:

The demo.sh file is a Bash script that, when run, completely automates the following:

The .def file is a script that Singularity can use to build an container of Gentoo Linux (plus Chapel prerequisite software installed).

I wrote such to:

Assuming one has an installed Linux system with typical shell commands installed along with sudo and Singularity installed, then running ./demo.sh will do the following:

After ./demo.sh finishes running, one can then explore the images. For example, to explore case02.dir, one would first run the following:

To give you an idea of resources needed: running demo.sh on a 16-core AMD Ryzen Threadripper PRO 3955WX takes approximately 30 minutes or so for each case after building the base Gentoo Linux container used by all cases. (For some reason Gentoo also pulls in Clang v12 which is also built when installing Clang v11.) Each case's log file is about 20MB and each case directory is about 3.3GB. (gentoo.dir is about 1.9GB and gentoo.log about 6.8MB.) The binary architecture targets 64-bit Linux (I didn't set a specific architecture.) The script does set the environment variable MAKEOPTS to take advantage of parallel building which is useful since building LLVM/clang is not fast --but you can edit/delete such if you want.

You can obtain a copy of the files needed to do this from the Gentoo issue here (let me know where you might want such posted here as it doesn't seem like I can attach files to a post in Github):

But before you can run such, you need to have sudo, the "standard" Linux command line tools, tar, wget, and Singularity all installed. Many Linux distributions don't have Singularity available as a package (some might appear to but often it is a completely different program) so one may have to build Singularity from source --which is not hard to do.

If Singularity is not available with your package manager or on a Linux system, you will need to install it from source which is detailed here: https://singularity.hpcng.org/user-docs/3.8/quick_start.html . Ensure the prerequisite tools mentioned are installed, then install Go, and then it is very easy to install Singularity. (You might want to also run sudo apt install build-essential (Debian/Ubuntu) or sudo yum groupinstall 'Development Tools' (CentOS, Fedora, other RedHat distributions) to install common code building compilers and utilities )

After the software prerequisites and Singularity have been installed, you should be able to successfully run ./demo.sh. The code in both files is straight-forward. If you've any questions, please ask.) After running such, the results are:

Setting aside how Chapel was built (all of these cases did exactly the same commands + there was one (1) edit to one file in the downloaded Chapel v1.25.0 tarball (see later in this post), the differences between these cases is only how the LLVM/Clang compiler was compiled involving these three Gentoo "USE" flags:

(Of these case06 is not a Chapel issue: it failed because of a package manager "block" refusing to even install Clang --so this is not a Chapel issue and can be ignored here.)

From this (without further explorations, etc.) it appears that Chapel will build and run using these settings:

export CHPL_LLVM=system
export CHPL_LLVM_CONFIG=/usr/lib/llvm/11/bin/llvm-config
export CHPL_HOST_COMPILER=llvm
export CHPL_HOST_CC=clang-11
export CHPL_HOST_CXX=clang++-11
export CHPL_TARGET_COMPILER=llvm
export CHPL_TARGET_CC=clang-11
export CHPL_TARGET_CXX=clang++-11
sed -i -e 's/LLVM_CLANG_LIBS=.*/LLVM_CLANG_LIBS=-lclang-cpp/' third-party/llvm/Makefile.include-system
source util/setchplenv.bash
make -j$(nproc)

Notice the change to only have -lclang-cpp in the LLVM Makefile as previous mentioned in this issue. I also explicitly set the compilers since Clang v12 was also installed. (This might be overkill for Chapel --but for now reduces a variable.) The result of these cases appear to demonstrate that only when the Gentoo default-compiler-rt and default-libcxx USE flags are not set, Chapel builds successfully. Assuming this isn't a random/weird issue arising from Gentoo Linux, this implies there may be issues with Chapel when it is not using libgcc and libstdc++ under Linux where GCC is the main compiler. As I have been busy creating a solution that can be easily adopted and used by you and the Chapel team, I've not explored this further.

Should you have issues installing Singularity or don't have sudo access on a machine with Singularity kindly let me know. As this post is long enough so I will end it here for now --but please do let me know of any issues with the above, etc.:-)

Paul

P.S. I am open to using audio/video conferencing should you want to audio/video conference to discuss any of the above, etc. Let me know if you would like to do such. (My time zone is Eastern Time (North America).)

mppf commented 3 years ago

@preney - thanks for your suggestions and scripts to help reproduce. I've started to run these builds but I will have to come back to it later to investigate further.

mppf commented 3 years ago

@preney - I see the issues described in the 6 cases. While I appreciate that it would be nice if chpl can work with all of these variations on the LLVM build, it sounded from the original post on this issue that you were running into more severe problems.

but when chpl is run (including with no arguments) it immediately segfaults, e.g.,

Are you still seeing this? Did you still need to use -fPIC in your other experiments? (I think no?)

preney commented 3 years ago

-fPIC is not needed. :-) The thing needed is for Chapel to be able to handle using Clang's extra tools published as dynamically-linked (i.e., "dylib") libraries (i.e., -lclang-cpp as demonstrated with my examples). Specifically, if the system Clang only has only dylib files installed then the effect of this edit is required in order to built Chapel successfully:

sed -i -e 's/LLVM_CLANG_LIBS=.*/LLVM_CLANG_LIBS=-lclang-cpp/' third-party/llvm/Makefile.include-system
source util/setchplenv.bash

or it will fail to produce the Chapel compiler due to a linking failure. (The static libraries need to be removed in this case as well of course.)

Obviously various systems will have static libraries and/or the dynamic ones --so Chapel should be able to handle both. Such wouldn't be hard to add to Chapel and this could even be automatically detected since the LLVM_CONFIG path could be scanned to see which libraries are available. :-)


Unfortunately, this might not not be the only fix to Chapel concerning LLVM/Clang support. Ideally cases 2, 3, and 5 also work --but they failed (with unresolved symbol linker errors). I've not had the time to investigate these yet so I cannot say much more than that at this time.

Of cases 2, 3, and 5, what concerns me the most is the case where clang was built to use libc++ as the default C++ Standard Library implementation instead of libstdc++ (i.e., GCC's). This should have 100% worked --but it didn't and so IMHO needs to be looked in to. :-)


For all cases in demo.sh, Chapel is built in exactly the same way, i.e.,

export CHPL_LLVM=system
export CHPL_LLVM_CONFIG=/usr/lib/llvm/11/bin/llvm-config
export CHPL_HOST_COMPILER=llvm
export CHPL_HOST_CC=clang-11
export CHPL_HOST_CXX=clang++-11
export CHPL_TARGET_COMPILER=llvm
export CHPL_TARGET_CC=clang-11
export CHPL_TARGET_CXX=clang++-11
# patch third-party/llvm/Makefile.include-system
source util/setchplenv.bash
make

I did this because Gentoo Linux pulls in Clang v12 when one tells it to install Clang v11. (I don't know why it does this.) To help ensure Chapel only used clang-11 and clang++-11 and never clang or clang++ (which would incorrectly invoke version 12), I used the HOST and TARGET environment variables seen above as a "paranoid" measure (e.g., one less thing to debug). I hope it is possible to not specify anything other than CHPL_LLVM=system and CHPL_LLVM_CONFIG=path --but I've not had time to test this yet. :-)

mppf commented 3 years ago

Thanks for the clarifications

Obviously various systems will have static libraries and/or the dynamic ones --so Chapel should be able to handle both. Such wouldn't be hard to add to Chapel and this could even be automatically detected since the LLVM_CONFIG path could be scanned to see which libraries are available. :-)

I was looking at just using -lclang-cpp. Do you know of any systems where a package with that is not available?

preney commented 3 years ago

Realistically that should work and seems reasonable. That said, I've checked other distributions to see what they provide in their clang implementations. It seems to me, worst case, a simple script could quickly check the LLVM config path for the clang-cpp library or the static ones and emit an error if such cannot be found.

mppf commented 3 years ago

I've just merged #18606 to address the -lclang-cpp part. I'm going to close this issue and open a new one to track the remaining work about default-compiler-rt and default-libcxx configurations. The new issue is #18643.