dotnet / jitutils

MIT License
144 stars 59 forks source link

Update coredistools #370

Closed BruceForstall closed 8 months ago

BruceForstall commented 1 year ago
  1. Update from LLVM 13.0.1 to 17.0.6
  2. Change Linux/Mac build to build llvm-tblgen from source instead of downloading a pre-built version. LLVM doesn't always publish all architecture versions of this tool.
  3. Change Linux to build with CBL-Mariner container.

TO DO:

Fixes #372

BruceForstall commented 1 year ago

Looks like the linux built llvm-tblgen on ubuntu-20.04 doesn't correctly run on ubuntu-20.04 for linux-x64 coredistools build (crashes? hangs?).

It also fails to run at all, due to libraries dependencies, on the Linux arm/arm64 cross-build Docker containers.

Probably would be best to abandon trying to use ubuntu at all, and try to get Mariner to work. Maybe after Mariner is updated to LLVM 16.

BruceForstall commented 1 year ago

I'm currently getting this when building locally (under Mariner linux-x64 container):

I have no name!@3b1d8f6db879:~/gh/jitutils$ ./build-tblgen.sh linux-x64 /crossrootfs/x64
~/gh/jitutils/obj ~/gh/jitutils
-- The C compiler identification is Clang 16.0.0
-- The CXX compiler identification is Clang 16.0.0
-- The ASM compiler identification is Clang
-- Found assembler: /usr/local/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/local/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/local/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python3: /usr/bin/python3.9 (found suitable version "3.9.14", minimum required is "3.6") found components: Interpreter
-- Performing Test LLVM_LIBSTDCXX_MIN
-- Performing Test LLVM_LIBSTDCXX_MIN - Failed
CMake Error at cmake/modules/CheckCompilerVersion.cmake:88 (message):
  libstdc++ version must be at least 7.1.
Call Stack (most recent call first):
  cmake/config-ix.cmake:15 (include)
  CMakeLists.txt:848 (include)
BruceForstall commented 12 months ago

Looks like we still fail to build LLVM tblgen for LLVM 16.0.6 due to the same libstdc++ version issue when building under CBL-Mariner (mcr.microsoft.com/dotnet-buildtools/prereqs:cbl-mariner-2.0-cross-amd64):

./build-tblgen.sh linux-x64 /crossrootfs/x64
========================== Starting Command Output ===========================
/usr/bin/bash --noprofile --norc /__w/_temp/d033f5ce-f10f-4de7-a156-dad2298a6c2b.sh
/__w/1/s/obj /__w/1/s
-- The C compiler identification is Clang 16.0.0
-- The CXX compiler identification is Clang 16.0.0
-- The ASM compiler identification is Clang
-- Found assembler: /usr/local/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/local/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/local/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python3: /usr/bin/python3.9 (found suitable version "3.9.14", minimum required is "3.6") found components: Interpreter 
-- Performing Test LLVM_LIBSTDCXX_MIN
-- Performing Test LLVM_LIBSTDCXX_MIN - Failed
-- Configuring incomplete, errors occurred!
See also "/__w/1/s/obj/CMakeFiles/CMakeOutput.log".
See also "/__w/1/s/obj/CMakeFiles/CMakeError.log".
CMake Error at cmake/modules/CheckCompilerVersion.cmake:88 (message):
  libstdc++ version must be at least 7.1.
Call Stack (most recent call first):
  cmake/config-ix.cmake:15 (include)
  CMakeLists.txt:848 (include)

Looks like /crossrootfs/x64 has libstdc++.so.6?

I have no name! [ /opt/code ]$ ls -l -aF /crossrootfs/x64/usr/lib/x86_64-linux-gnu/libstdc*
lrwxrwxrwx 1 root root      19 Oct  4  2019 /crossrootfs/x64/usr/lib/x86_64-linux-gnu/libstdc++.so.6 -> libstdc++.so.6.0.21
-rw-r--r-- 1 root root 1566440 Oct  4  2019 /crossrootfs/x64/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21

@sbomer Does this make sense? Are the Mariner images going to get an updated libstdc++ version 7 sometime? Any other suggestion?

cc @jakobbotsch

sbomer commented 12 months ago

Those images have an old ubuntu rootfs that we target in our official builds for broad compatibility with a large range of glibc versions, and they don't have libstdc++ 7 installed or available in the repos.

@directhex ran into this too, and created images specifically to solve this problem, which have a newer rootfs (Ubuntu 18.04 instead of 16.04): https://github.com/dotnet/dotnet-buildtools-prereqs-docker/pull/857. I would try using those images instead (for example, mcr.microsoft.com/dotnet-buildtools/prereqs:cbl-mariner-2.0-cross-ubuntu-18.04-amd64).

cc @directhex in case you have any advice.

directhex commented 12 months ago

For .NET 9, you can use a newer crossroot as @sbomer suggests. We no longer target Ubuntu 16.04 for .NET 9, but AFAIK our build images haven't been updated to reflect it yet.

For .NET 8, you need to bring your own better c++ library. LLVM comes with one, you can either build it yourself or consume it via an LLVM nuget, then bundle it & fix up the rpath value in your libs/executables so they still work. See _LibCxxBootstrap target on https://github.com/dotnet/llvm-project/blob/dotnet/main-16.x/llvm.proj#L147 for building libc++, and https://github.com/dotnet/llvm-project/blob/dotnet/main-16.x/llvm.proj#L127 and https://github.com/dotnet/llvm-project/blob/dotnet/main-16.x/llvm.proj#L54 for consuming it.

BruceForstall commented 12 months ago

I don't think I need to worry about .NET 8, so I'll try the new images. Thanks!

BruceForstall commented 12 months ago

Well, the new containers allowed the build to succeed, but now there is some missing or mis-versioned dependency:

./bin/llvm-tblgen: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory
I have no name! [ /opt/code ]$ find /usr -iname libtinfo.so\*
/usr/lib/libtinfo.so.6
/usr/lib/libtinfo.so.6.4
sbomer commented 12 months ago

Ah, it looks like the 18.04 rootfs had libtinfo.so.5, but the mariner host on which tblgen is running has libtinfo.so.6. It looks like you can get libtinfo.so.5 with tdnf install -y ncurses-compat.

BruceForstall commented 12 months ago

It looks like you can get libtinfo.so.5 with tdnf install -y ncurses-compat.

Maybe that worked? Now I get:

./bin/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by ./bin/llvm-tblgen)

but it does seem to run.

BruceForstall commented 12 months ago

I tried to add a step to install libtinfo.so.5 to the docker container before llvm-tblgen is run, but got a permissions problem:

tdnf install -y ncurses-compat
========================== Starting Command Output ===========================
/usr/bin/bash --noprofile --norc /__w/_temp/a6a50520-a2ed-48fc-8558-958cad13608d.sh
Error(1601) : Operation not permitted. You have to be root.

Maybe need to add it to the docker container construction for these? Or maybe there's some other permissions magic that will make it work?

sbomer commented 12 months ago

Try sudo? I think Azure Pipelines grants the user that runs the job passwordless sudo permissions:

Try to create a user with UID '1001' inside the container.
/usr/bin/docker exec  fca87559f3deda754d3d4ef29a7ef33817a0ae046a5b05d79e8c591004dc1c2d bash -c "getent passwd 1001 | cut -d: -f1 "
/usr/bin/docker exec  fca87559f3deda754d3d4ef29a7ef33817a0ae046a5b05d79e8c591004dc1c2d groupadd -g 127 docker_azpcontainer
/usr/bin/docker exec  fca87559f3deda754d3d4ef29a7ef33817a0ae046a5b05d79e8c591004dc1c2d useradd -m -g 127 -u 1001 vsts_azpcontainer
Grant user 'vsts_azpcontainer' SUDO privilege and allow it run any command without authentication.
/usr/bin/docker exec  fca87559f3deda754d3d4ef29a7ef33817a0ae046a5b05d79e8c591004dc1c2d groupadd azure_pipelines_sudo
/usr/bin/docker exec  fca87559f3deda754d3d4ef29a7ef33817a0ae046a5b05d79e8c591004dc1c2d usermod -a -G azure_pipelines_sudo vsts_azpcontainer
/usr/bin/docker exec  fca87559f3deda754d3d4ef29a7ef33817a0ae046a5b05d79e8c591004dc1c2d su -c "echo '%azure_pipelines_sudo ALL=(ALL:ALL) NOPASSWD:ALL' >> /etc/sudoers"
BruceForstall commented 12 months ago

Try sudo? I think Azure Pipelines grants the user that runs the job passwordless sudo permissions:

Will try.

Otherwise, looks like adding ncurses-compat to the tdnf install -y line in

dotnet/dotnet-buildtools-prereqs-docker : src\cbl-mariner\2.0\crossdeps-builder\Dockerfile

is the place to add it?

sbomer commented 12 months ago

https://github.com/dotnet/dotnet-buildtools-prereqs-docker/blob/fc0853cbd2ab042fcaa762b90759092336a2b9f3/src/cbl-mariner/2.0/crossdeps/Dockerfile would be the place to add it. I was a little hesitant to recommend this since we try to keep the build images pretty minimal, but I see we already added pip3 and zlib - so it probably doesn't hurt.

directhex commented 12 months ago

Let's just get rid of the 18.04 images? The baseline for net9 is 20.04 isn't it? Bump the docker images to do that instead of 18.04, then you get the 6 SONAME for tinfo

BruceForstall commented 12 months ago

Let's just get rid of the 18.04 images? The baseline for net9 is 20.04 isn't it? Bump the docker images to do that instead of 18.04, then you get the 6 SONAME for tinfo

Maybe? I don't understand the Linux versioning rules. Note that https://github.com/dotnet/runtime/pull/86194 moved all (or, at least most) CI VMs to 22.04. So maybe update to 22.04?

BruceForstall commented 12 months ago

Installing ncurses-compat using sudo worked.

However, now the Linux x64 and arm64 builds (build-coredistools.sh step) are failing with AzDO failures:

##[error]The hosted runner encountered an error while running your job. (Error Type: Disconnect).
,##[warning]Received request to deprovision: The request was cancelled by the remote provider.

at almost exactly 29 minutes. But the Linux arm one succeeds after just 6 minutes.

Don't know what to do about it: there's no further info.

sbomer commented 12 months ago

Not sure about the latest failure - let's retry.

Let's just get rid of the 18.04 images? The baseline for net9 is 20.04 isn't it? Bump the docker images to do that instead of 18.04, then you get the 6 SONAME for tinfo

Good point, I just checked with @richlander and that's right. edit: see https://github.com/dotnet/runtime/issues/91826 for details.

Note that https://github.com/dotnet/runtime/pull/86194 moved all (or, at least most) CI VMs to 22.04. So maybe update to 22.04?

We build against a lower version (for broad glibc compat) than the version we run on in ci. So we would update the mariner build images to have a 20.04 rootfs, which will have libtinfo.so.6.

BruceForstall commented 12 months ago

Same result: builds cancelled at 27m 32s.

https://dev.azure.com/dnceng-public/public/_build/results?buildId=400328&view=results

sbomer commented 12 months ago

Looks like there were some similar failures on OSX in the past: https://github.com/dotnet/runtime/issues/34647. Could the build be filling up the disk?

BruceForstall commented 12 months ago

Maybe? These aren't OSX machines, though. I could add a "df -H" job to see the "before" state, but that wouldn't actually help if the cmake/llvm build itself is going crazy. I'll try again locally (on WSL2) but It's worked for me recently.

BruceForstall commented 8 months ago

When coredistools is updated, https://github.com/dotnet/runtime/pull/91668 should be reverted.

BruceForstall commented 8 months ago

Current status:

##[error]The hosted runner encountered an error while running your job. (Error Type: Disconnect).
,##[warning]Received request to deprovision: The request was cancelled by the remote provider.

When this happens, there is no log file output. (It says Nothing to show. Final logs are missing. This can happen when the job is cancelled or times out.). The job specifies 60 minute timeout.

If you watch the job, you can capture the in-progress build. linux-x64 seems to hang here:

...
[ 10%] Building X86GenFoldTables.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenRegisterBank.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenPreLegalizeGICombiner.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenRegisterInfo.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenSubtargetInfo.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenSystemOperands.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Built target LLVMSupportBlake3

and linux-arm64 hangs here:

...
[  5%] Building X86GenRegisterInfo.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[  5%] Building X86GenSubtargetInfo.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenRegisterBank.inc...
[ 10%] Building AArch64GenRegisterInfo.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building X86GenFoldTables.inc...
[ 10%] Building AArch64GenSubtargetInfo.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
[ 10%] Building AArch64GenSystemOperands.inc...
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
/__w/1/tblgen-linux/llvm-tblgen: /lib/libtinfo.so.5: no version information available (required by /__w/1/tblgen-linux/llvm-tblgen)
BruceForstall commented 8 months ago

@dotnet/jit-contrib This is ready to be reviewed and merged. cc @sbomer

The CI system is still failing to build the linux-arm64 and linux-x64 versions. It seems to hang at some point. I can build them fine, using the exact same CBL-Mariner docker containers and scripts, in my WSL2 Ubuntu 22.04.3 LTS OS. So I'm mystified as to the reason. If anyone has experience debugging the LLVM build process and AzDO builds, feel free to investigate.

I built a new coredistools nuget package using all the successfully built components in the CI, plus linux-arm64 and linux-x64 built on my machine.

BruceForstall commented 8 months ago

@jakobbotsch @dotnet/jit-contrib PTAL

BruceForstall commented 8 months ago

Almost 9 months to get this in :-)

Unfortunately, linux-x64 and linux-arm64 builds in the CI still hang (or otherwise fail). However, they succeed when built using the same Docker build containers on my Ubuntu 22.04 WSL2 host, so that's what I used to build the 1.4.0 coredistools package.

It would still be nice to fix the CI if we could figure out how to debug the problem.