adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
85 stars 101 forks source link

Provision AIX 7.1 TL5 SP5 build machines for OpenJ9 JDK13+ builds #1006

Closed pshipton closed 4 years ago

pshipton commented 4 years ago

See https://github.com/eclipse/openj9/issues/7786

In order to build OpenJ9 for Java 13 and later, we need a new version of the AIX dump utility, which is included in 7.1 TL5 SP5 (SP5 being the new part). All the AIX machines used for build need to be upgraded.

sxa commented 4 years ago

@pshipton I presume this is unrelated to my current issue where I'm getting this?

#error "Please include ibmdemangle.h for xlclang++ to demangle symbol names."
sxa commented 4 years ago

We're currently on TL04SP1 so this isn't just an SP upgrade. I've got a couple of new machines running 7.2 TL03SP3 that haven't been set up yet - would they have the later dump? (Not that we necessarily want to use a later level for building on of course)

pshipton commented 4 years ago

I presume this is unrelated to my current issue where I'm getting #error "Please include ibmdemangle.h for xlclang++ to demangle symbol names."?

Correct, this is unrelated.

I've got a couple of new machines running 7.2 TL03SP3 that haven't been set up yet - would they have the later dump

No, only TL5 SP5 contains the new version of dump

sxa commented 4 years ago

OK I think I've got one OSUOSL machine kicking around that I haven't got set up yet. I would propose trying to use that one for this, albeit with the risk that we don't have console access. I'll assign this one to @sej-jackson for now so she can progress when possible.

sxa commented 4 years ago

I have a meeting on Monday about trying to get a couple of the new machines that I have been allocated reinstalled with this level (They're currently installed with 7.2)

sxa commented 4 years ago

Had a call this evening and should be getting two new 7.1SP5TL5 machines within a couple of days :-) (Will need setup after we receive them too of course but that hopefully won't be too hard)

sxa commented 4 years ago

@pshipton I'm going to rename the title of this to avoid misleading anyone into believing we're going to upgrading all of the existing build machines to that level.

Also your initial description says "all the AIX machines used for build need to be upgraded." i assume from the description that you don't mind having the previous JDK levels (8 and 11) with OpenJ9 still built on the earlier OS? My preference is to avoid modifying the build machines for existing LTS releases to avoid potentially breaking compatibility for existing LTS users of AdoptOpenJDK so I would currently anticipate the new setups only to be used for JDK13 (and later!) builds at present.

pshipton commented 4 years ago

i assume from the description that you don't mind having the previous JDK levels (8 and 11) with OpenJ9 still built on the earlier OS?

There is talk about updating Java 11 to use xlc 16 (but not clang). It is possible, but I'm not sure, if that happens we might need the newer machine for Java 11 builds. We should keep 8 and 11 as-is, and wait and see.

pshipton commented 4 years ago

While TL5 SP5 contains a dump utility which works correctly, we were able to work around the problems with the broken utility and get today's AIX for Java 13+ working even on older AIX levels. The AIX changes are delivered today and should then be included in the next nightly build.

Some machines at OpenJ9 completed the upgrades to TL5 SP5. I think it's a good idea to complete the upgrade at Adopt in case we find it's needed in the future.

sxa commented 4 years ago

New machine set up by @sej-jackson at https://ci.adoptopenjdk.net/computer/build-ibm-ppc64-aix-71-1 Seems to build twice as fast based on #122 of https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk13u/job/jdk13u-aix-ppc64-hotspot/ openj9 version is running now to verify that it runs through: https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk13u/job/jdk13u-aix-ppc64-openj9 We've used the latest versions of various tools from the AIX toolbox and some of them have a new dependency on libiconv.so.2 which isn't in the default AIX libiconv.a so when we run many of the tools through jenkins (The java agent I believe causes /usr/lib to be in the start of the LIBPATH) it causes problems as it doesn't find the library it is looking for.

In order to mitigate that, at least on a temporary basis, I have extracted the libiconv.so.2 (both 32 and 64 bit versions) from /opt/freeware/lib/libiconv.a and added them into /usr/lib/libiconv.a. This is a horrendous solution but at least for now it works and I don't think has any obvious side effects - at least until we do any yum update operation that bumps the level of the RPM package up (Currently libiconv-1.14-2.ppc)

NOTE: There is so far only one of these machines set up so we have no redundancy for the January release on here. If needed we could set up another one the same way but I want to consider whether what I've done with libiconv.a is sensible (I'm certain it isn't...), whether we should downgrade some of the yum installed packages to those which don't have a libiconv.so.2 dependency, or ... something else.

sej-jackson commented 4 years ago

Before I forget, here's what I did to get the aix.yml to run on AIX 7.1 TL05 SP5....

I was running as a non-privileged account, with my own hosts file containing the following variables:

ansible_user=root
ansible_python_interpreter=/usr/bin/python
Domain=centers.ihost.com
  1. My initial plan was to only skip the ramdisk setup, as the system had 140GB disk, 8GB RAM, 4GB paging, but an early attempt with no other mods failed when the AIX_filesystem_config.sh script ran out of disk space, so I re-sized them manually based on its sizings for a 125GB system:

    # df -g
    Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
    /dev/hd4          10.00      9.67    4%    20383     1% /
    /dev/hd2          20.00     13.50   33%    80629     3% /usr
    /dev/hd9var       10.00      9.33    7%     4866     1% /var
    /dev/hd3          30.00     29.74    1%       90     1% /tmp
    /dev/hd1          40.00     39.80    1%      154     1% /home
    /dev/hd11admin      0.25      0.25    1%        7     1% /admin
    /proc                 -         -    -        -      - /proc
    /dev/hd10opt      20.00     18.67    7%    27121     1% /opt
    /dev/livedump      0.25      0.25    1%        4     1% /var/adm/ras/livedump
    /aha                  -         -    -       33     1% /aha

    Note that I did not put all the remaining disk space into /home as the script tries to do, so there is around 3.5GB left for further installation, updates etc...

  2. I then hit a problem trying to download the rpms that were supposed to be obtained from bullfreeware and oss4aix, so instead I added the 4x packages (libiconv, libunistring, perl, & cmake) to the end of the toolbox list, and added rpm_install to my list of skiptags. This meant that the perl symlink didn't happen, but it turned out that the freeware and the original AIX versions of perl were both v5.28.1 anyway.

This was my only change to aix.yml, and I didn't change any other scripts, although I did need to override some variables and add more skiptags.... and a symlink.

  1. TASK [TestIBM XL C] failed because it couldn't load module /opt/freeware/lib/libintl.a(libintl.so.8) due to dependent module /usr/lib/libiconv.a not containing libiconv.so.2. @sxa555 directed me to a comment to the effect:

    libiconv needs to be fixed, for reasons that are absolutely not clear to me, I removed /usr/lib/libiconv.a and symlinked it to /opt/freeware/lib/libiconv.a.

  2. After this, I had a couple of issues related to me running the playbook as a non-root user as my account wasn't able to access the ssh keys in /Vendor_Files/keys, so I took copies of them and exported my own variables for Jenkins_User_SSHKey, Nagios_User_SSHKey, Zeus_User_SSHKey to get around that problem. It turned out that I didn't have the necessary files for nagios though, so I ended up skipping the setup for both that and zeus.

My skiptags ended up being: rpm_install,ramdisk,filesystem,nagios,superuser

sxa commented 4 years ago

Thanks @sej-jackson ... I think there may be some glitches we need to iron out with respect to being able to run tests (potentially related to the alternate source of perl...) but we can resolve them separately as this resolves the original request for a build machine at this level, therefore closing.

sxa commented 4 years ago

libiconv issue raised via http://forums.rootvg.net/aixtools/coreutils-rpm-packages-versus-'aixtools'-installp-based