adoptium / temurin-build

Eclipse Temurin™ build scripts - common across all releases/versions
Apache License 2.0
1k stars 243 forks source link

Trial devkit-based openjdk build on UBI #3847

Open sxa opened 3 weeks ago

sxa commented 3 weeks ago

With our switch to building on devkits on most Linux platforms it should not matter which base image we are using for the builds, as long as it can have the correct devkit installed and used.

This issue will cover attempting to use our existing devkit to build JDK21 on a later OS (Likely UBI9) on the platforms which we create devkits for and see if it (a) works and (b) can produce binary identical builds.

judovana commented 3 weeks ago

I'm looking forward for this! In my dream world theb build scripts would magically improve environemnt so it prodcue the desired build. So yah, rahter container.

sxa commented 3 weeks ago

The following Dockerfile is adequate to build on UBI9 with the devkit (This should have a checksum applied to the devkit download before any production use is considered). Also note that this download of the devkit is tied to aarch64 for now, purely because that's the fastest machine I have while trying this.

FROM redhat/ubi9
ARG user=jenkins

RUN groupadd -g 1000 ${user}
RUN useradd -c "Jenkins user" -d /home/${user} -u 1000 -g 1000 -m ${user}

RUN dnf -y install git xz procps-ng bzip2 autoconf file make diffutils unzip zip cpio
RUN mkdir -p /usr/local/devkit/gcc-11.3.0-Centos7.6.1810-b02
RUN curl -L https://github.com/adoptium/devkit-binaries/releases/download/gcc-11.3.0-Centos7.6.1810-b02/devkit-gcc-11.3.0-Centos7.6.1810-b02-aarch64-linux-gnu.tar.xz | tar xJpf  - -C /usr/local/devkit/gcc-11.3.0-Centos7.6.1810-b02

Noting that of the items in the dnf install command, xz, and procps-ng are required by the temurin processes, I believe all of the rest are openjdk progress but are not included in the devkit (Tagging @andrew-m-leonard here in case he's not aware of these)

I'm using the following environment. Note that I have NOT yet done any checks on binary reproducibility in this environment compared to the default Temurin one.

export VARIANT=temurin
export RELEASE=true
export BRANCH=jdk-21.0.3+9_adopt
export TAG=jdk-21.0.3+9_adopt
export BUILD_ARGS="--use-adoptium-devkit gcc-11.3.0-Centos7.6.1810-b02"

So TL;DR part (a) "It works" is satisfied :-)

sxa commented 3 weeks ago

--enable-dtrace (as used in the temurin build scripts) does not work as it can't find the headers from systemtap-sdt-devel. They are presumably coming from the host system in the Temurin builds and not from within the devkit (suggesting a devkit update to include the systemtap-sdt-devel package may be in order). Until that is done it's unlikely we'll be able to have an identical build.

Other than --enable-dtrace and --enable-sbom-strace I had to add the devkit binutils bin directory to the path otherwise objcopy was not found (EDIT: in theory export OBJCOPY=/usr/local/devkit/gcc-11.3.0-Centos7.6.1810-b02/bin/objcopy also works and is a less intrusive alternative but it fails with jdk.tools.jlink.plugin.PluginException: java.io.IOException: Cannot run program "objcopy": error=2, No such file or directory, but then the build works ok with the parameters from the SBoM on the UBI9 system (Boot JDK 20.0.2+9 was added later in /usr/lib/jvm/jdk-20 for consistency and to allow reproducibility). The SBoM generation does not work due to the lack of ant in the dockerfile.

./makejdk-any-platform.sh --clean-git-repo --jdk-boot-dir /usr/lib/jvm/jdk-20 --configure-args "--disable-warnings-as-errors --with-jobs=40" --target-file-name OpenJDK21U-jdk_aarch64_linux_hotspot_21.0.3_9.tar.gz --release --clean-libs --tag jdk-21.0.3+9_adopt --create-jre-image --create-sbom --use-adoptium-devkit gcc-11.3.0-Centos7.6.1810-b02 --user-openjdk-build-root-directory /home/jenkins/workspace/build-scripts/jobs/release/jobs/jdk21u/jdk21u-release-linux-aarch64-temurin/workspace/build/openjdkbuild --freetype-dir bundled --use-jep319-certs --create-debug-image --build-variant temurin jdk21u

judovana commented 3 weeks ago

Hi! Seeing the --use-jep319-cert Houdl that be used for all jdks, which are trying to be identical? Do you know if jdk8 honours --jep319?

judovana commented 3 weeks ago

I can see

--use-jep319-certs
Use certs defined in JEP319 in Java 8/9. Deprecated, has no effect.

Thats why the confusion

sxa commented 3 weeks ago

We should probably discuss anything related to the JEP319 parameter somewhere else as it's not directly relevant to building on a different OS :-)

judovana commented 3 weeks ago

sure. as yu command. Ty! I would not brought it up here if I would not see it in above commandline, and that surprised me. Sorry!

sxa commented 3 weeks ago

The linux_repro_build_compare.sh is not currently working with the latest release build so will need a bit of work before I can reliably do a comparison. I'm testing on linux-aarch64 which has a --with-jobs=40 and --with-sbom-strace which I have removed in order to allow the SBoM to be parsed correctly.

For the purposes of verifying reproducibility I've also added the systemtap-sdt-devel package to the machine in order to pick up the dtrace header files which are not included in the devkit. There is, of course, a risk that the systemtap headers from UBI will be different from the ones used in the original CentOS7 build and therefore introduce differences. We should consider including those in the devkit so they are used identically to the CentOS7 ones.

There is a lot of differences unfortunately from building on UBI9 - I'll try replicating the same in a similarly "clean" CentOS7 container with the devkit

[Edit: Same happened with CentOS7 although the script is pulling down and setting CC to the /usr/local/gcc11 compiler instead of using the one from the devkit (even thought the devkit parameter was specified)]:

Tools summary:
* Boot JDK:       openjdk version "20.0.2" 2023-07-18 OpenJDK Runtime Environment Temurin-20.0.2+9 (build 20.0.2+9) OpenJDK 64-Bit Server VM Temurin-20.0.2+9 (build 20.0.2+9, mixed mode, sharing) (at /usr/lib/jvm/jdk-20.0.2+9)
* Toolchain:      gcc (GNU Compiler Collection)
* Devkit:         gcc-11.3.0 - Centos7.6.1810 (/usr/local/devkit/gcc-11.3.0-Centos7.6.1810-b02)
* C Compiler:     Version 11.3.0 (at /usr/local/gcc11/bin/gcc-11.3)
* C++ Compiler:   Version 11.3.0 (at /usr/local/gcc11/bin/g++-11.3)
sxa commented 3 weeks ago

OK - fixed that (Removed the gcc download and commented out the call to setEnvironment in the script) and the result is identical on UBI9+devkit :-) Luckily the systemtap-sdt-devel is good enough that it doesn't break reproducibility.

Comparing ...
diff -r jdk-21.0.3+9/release compare.135611/jdk-21.0.3+9/release
12c12
< BUILD_SOURCE_REPO="https://github.com/adoptium/temurin-build.git"
---
> BUILD_SOURCE_REPO="https://github.com/adoptium/temurin-build"
Differences found..., logged in: reprotest.diff
[root@56ec34193460 reproducible]# cat /etc/*release
NAME="Red Hat Enterprise Linux"
VERSION="9.0 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.0"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.0 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/9/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.0
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.0"
Red Hat Enterprise Linux release 9.0 (Plow)
Red Hat Enterprise Linux release 9.0 (Plow)
[root@56ec34193460 reproducible]#
sxa commented 3 weeks ago

[this comment intentionally left blank - will include summary of reproducing the above]

sxa commented 3 weeks ago

Next step - investigate the devkit build on a later OS. @andrew-m-leonard has hit issues building the CentOS7-compatible devkit on a Fedora 38 system, but it would be good to confirm if we can build a suitable devkit that can produce a devkit that targets CentOS7's glibc (2.17) on a later distribution.

I did a few scenarios in https://github.com/adoptium/temurin-build/issues/3700 but I'll try some more on aarch64 to see if I can replicate any problems (and ideally see if I can make it work!)

Memo to self (Fro RHEL7, others may need tweaking)

Attempting on various distributions:

Distribution Notes
centos:7 With make 4.1 it builds ok
centos:8 texinfo install required dnf -y install dnf-plugins-corethendnf config-manager --set-enabled powertools`
quay.io/centos/centos:9stream Builds ok
fedora:38 cpu needs to be set with uname -m in make/devkit/Makefile. May need --without-isl removing from gcc Makefile. Expect GLIBCXX_3.4.32 from build/devkit/result/aarch64-linux-gnu-to-aarch64-linux-gnu/aarch64-linux-gnu/bin/ld at some point
fedora:34 Builds ok (No need to ISL to be disabled)
fedora:40 Build failure error: passing argument 1 of ‘set_32’ from incompatible pointer type [-Wincompatible-pointer-types] [*] ... With ISL disabled we get GLIBCXX_3.4.32 not found (required by /.../ld)

[*] - Failure was as follows:

`make[5]: Leaving directory '/root/ci-jenkins-pipelines/pipelines/build/devkit/jdk21u/build/devkit/aarch64-linux-gnu/aarch64-linux-gnu/gcc-11.3.0/intl'
/root/ci-jenkins-pipelines/pipelines/build/devkit/jdk21u/build/devkit/src/gcc-11.3.0/libiberty/simple-object-mach-o.c: In function ‘simple_object_mach_o_write_segment’:
/root/ci-jenkins-pipelines/pipelines/build/devkit/jdk21u/build/devkit/src/gcc-11.3.0/libiberty/simple-object-mach-o.c:1231:17: 
sxa commented 2 weeks ago

The version of libstdc++.so built in the JDK21 devkit has symbols up to GLIBCXX_3.4.29 regardless of which distribution it is built on. The problem occurs when the distribution has a system libstdc++ which has a later version than the one built in the devkit. For reference:

Distribution libstdc++ version
CentOS7 GLIBCXX_3.4.19
CentOS8 GLIBCXX_3.4.25
CentOS9 Stream GLIBCXX_3.4.29
Fedora 34 GLIBCXX_3.4.29
Fedora 38 GLIBCXX_3.4.32
Fedora 40 GLIBCXX_3.4.33
andrew-m-leonard commented 2 weeks ago

The version of libstdc++.so built in the JDK21 devkit has symbols up to GLIBCXX_3.4.29 regardless of which distribution it is built on. The problem occurs when the distribution has a system libstdc++ which has a later version than the one built in the devkit. For reference: Distribution libstdc++ version CentOS7 GLIBCXX_3.4.19 CentOS8 GLIBCXX_3.4.25 CentOS9 Stream GLIBCXX_3.4.29 Fedora 34 GLIBCXX_3.4.29 Fedora 38 GLIBCXX_3.4.32 Fedora 40 GLIBCXX_3.4.33

@sxa thanks for working this out. I'm a little confused though, it doesn't seem quite right that a Centos7 "sysroot" libstc++ has GLIBCXX_3.4.29 when built for Centos7 ? Is it saying that the DevKit "itself" contains references to GLIBCXX_3.4.29, but the code it would generate would be Centos7 based (ie.up to GLIBCXX_3.4.19). @fitzsim ?

sxa commented 2 weeks ago

@sxa thanks for working this out. I'm a little confused though, it doesn't seem quite right that a Centos7 "sysroot" libstc++ has GLIBCXX_3.4.29 when built for Centos7 ? Is it saying that the DevKit "itself" contains references to GLIBCXX_3.4.29, but the code it would generate would be Centos7 based (ie.up to GLIBCXX_3.4.19). @fitzsim ?

The libcstdc++ library which contains those symbols is built along with GCC. While I haven't delved into why yet (too many other things vying for my attention today so I haven't looked any deeper) my working assumption would be that the JDK is statically linking in the bits it needs from there so that it doesn't have a dynamic need for the version supplied with the OS (Unlike ld from binutils which does load it dynamically...) therefore doesn't hit an issue at runtime on the older systems.

fitzsim commented 2 weeks ago

@sxa thanks for working this out. I'm a little confused though, it doesn't seem quite right that a Centos7 "sysroot" libstc++ has GLIBCXX_3.4.29 when built for Centos7 ? Is it saying that the DevKit "itself" contains references to GLIBCXX_3.4.29, but the code it would generate would be Centos7 based (ie.up to GLIBCXX_3.4.19). @fitzsim ?

The libcstdc++ library which contains those symbols is built along with GCC. While I haven't delved into why yet (too many other things vying for my attention today so I haven't looked any deeper) my working assumption would be that the JDK is statically linking in the bits it needs from there so that it doesn't have a dynamic need for the version supplied with the OS (Unlike ld from binutils which does load it dynamically...) therefore doesn't hit an issue at runtime on the older systems.

I am pretty sure this is an issue with the GNU gold build, within the binutils build. GNU gold is special among the binutils utilities in that it is the only one that has C++ code. Based on the error output posted to app.slack.com by @andrew-m-leonard:

Error is actually in configure log file: ~/jdk21u/build/devkit/aarch64-linux-gnu/aarch64-linux-gnu/gcc-11.3.0/fixincludes/config.log

configure:3652: checking whether the C compiler works
configure:3674: ~/jdk21u/build/devkit/aarch64-linux-gnu/aarch64-linux-gnu/gcc-11.3.0/./gcc/xgcc -B~/jdk21u/build/devkit/aarch64-linux-gnu/aarch64-linux-gnu/gcc-11.3.0/./gcc/ -B~/jdk21u/build/devkit/result/aarch64-linux-gnu-to-aarch64-linux-gnu/aarch64-linux-gnu/bin/ -B~/jdk21u/build/devkit/result/aarch64-linux-gnu-to-aarch64-linux-gnu/aarch64-linux-gnu/lib/ -isystem ~/jdk21u/build/devkit/result/aarch64-linux-gnu-to-aarch64-linux-gnu/aarch64-linux-gnu/include -isystem ~/jdk21u/build/devkit/result/aarch64-linux-gnu-to-aarch64-linux-gnu/aarch64-linux-gnu/sys-include  -fdebug-prefix-map=~/jdk21u/build/devkit=devkit  -g -O2  -static-libstdc++ -static-libgcc  conftest.c  >&5
[...]
~/jdk21u/build/devkit/result/aarch64-linux-gnu-to-aarch64-linux-gnu/aarch64-linux-gnu/bin/ld: ~/jdk21u/build/devkit/aarch64-linux-gnu/aarch64-linux-gnu/gcc-11.3.0/aarch64-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by ~/jdk21u/build/devkit/result/aarch64-linux-gnu-to-aarch64-linux-gnu/aarch64-linux-gnu/bin/ld)
collect2: error: ld returned 1 exit status

I suspect what is happening is: devkit's GNU gold (here, ld) is being built against the build system's libstdc++.so [this is the issue]. When devkit GCC is configured against devkit binutils, LD_LIBRARY_PATH is [correctly] set to devkit's libstdc++.so install path.

When the build system's libstdc++.so's version is equal to or older than devkit's libstdc++.so, this is fine. But when the build system's libstdc++.so's version is newer than devkit's libstdc++.so, then the GNU gold ld binary will be expecting the newer GLIBCXX versioned symbol, and not finding it, exit with an error.

The issue may arise from a bootstrapping loop where devkit GNU gold is needed to build devkit libstdc++.so and vice versa.

To rule this out, it may be worth attempting to build and install devkit libstdc++.so and binutils twice; first against the build system tools, then a second time, against the just-installed devkit libstdc++.so binutils. The idea is the same as what @andrew-m-leonard did to bootstrap devkit GCC.

I think this familiar approach is worth a try, before deep-diving into binutils and GNU gold configury.

(I looked at https://github.com/openjdk/jdk/blob/master/make/devkit/Tools.gmk and the double-build devkit GCC logic does not jump out at me; is it there and I am missing it, or has it not been submitted upstream yet?)

andrew-m-leonard commented 2 weeks ago

thanks @sxa @fitzsim That's a good experiment Thomas, i'll try bootstrapping binutils, and see what happens, thanks

sxa commented 2 weeks ago

I am pretty sure this is an issue with the GNU gold build, within the binutils build. GNU gold is special among the binutils utilities in that it is the only one that has C++ code

Make sense if that's the only part of binutils that's C++. Thanks.

To rule this out, it may be worth attempting to build and install devkit libstdc++.so and binutils twice; first against the build system tools, then a second time, against the just-installed devkit libstdc++.so binutils. The idea is the same as what @andrew-m-leonard did to bootstrap devkit GCC.

Agreed. if we can feasibly do that between the bootstrap and final devkit builds that might be good.

(I looked at https://github.com/openjdk/jdk/blob/master/make/devkit/Tools.gmk and the double-build devkit GCC logic does not jump out at me; is it there and I am missing it, or has it not been submitted upstream yet?)

It's not upstream yet - like the CentOS7 support it's currently part of our make_devkit.sh script in https://github.com/adoptium/ci-jenkins-pipelines/commit/b5f07cd621f9d950dfa1dd1b74af8831b25a9fe2 - I would not recommend upstreaming this until at least July when the C7 URLs will change to the vault mirror.

(I do also wonder if we only need to bootstrap ld to resolve some of the issues that triggered that boostrap and the issue we're seeing here, but I guess this works even if it does add a signiificant amount of extra time to the bootstrap due to double-building gcc - or more since it typically bootstraps itself)

sxa commented 2 weeks ago

Noting that the devkit build process has had the variables for all of the binutils set in addition to the devkit path being used for the bootstrapping as of https://github.com/adoptium/ci-jenkins-pipelines/pull/1043