adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
86 stars 101 forks source link

AIX machine required for non-JCK testing #146

Closed smlambert closed 4 years ago

smlambert commented 6 years ago

Similar to #133 , there is currently no machine available for running openjdk regression, system, external, perf, and functional tests against the AIX builds. (for reference on what tests are enabled and what tests are not due to not having available machines, please see https://docs.google.com/spreadsheets/d/1X4CCfvMoCgEavRbvejHrTvPnqj37MB-_C6LB6b8Akkc/edit?usp=sharing).

sxa commented 5 years ago

Currently sharing with one of the build boxes, but leaving this open since we could really do with more ...

sxa commented 5 years ago

Two new machines allocated. Various playbook modifications needed to stabilise them. Currently failing on a `git init operation:

https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-aix-ppc64-openj9/381/console


Running on test-osuosl-ppc64-aix-71-1 in /home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-aix-ppc64-openj9
[Pipeline] {
[Pipeline] stage
[Pipeline] { (build)
[Pipeline] checkout
No credentials specified
Cloning the remote Git repository
ERROR: Error cloning remote repo 'origin'
hudson.plugins.git.GitException: Could not init /home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-aix-ppc64-openj9
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$5.execute(CliGitAPIImpl.java:882)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$2.execute(CliGitAPIImpl.java:662)
    at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:161)
    at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:154)
    at hudson.remoting.UserRequest.perform(UserRequest.java:212)
    at hudson.remoting.UserRequest.perform(UserRequest.java:54)
    at hudson.remoting.Request$2.run(Request.java:369)
    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:819)
    Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to test-osuosl-ppc64-aix-71-1
        at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1743)
        at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
        at hudson.remoting.Channel.call(Channel.java:957)
        at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:146)
        at sun.reflect.GeneratedMethodAccessor372.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:132)
        at com.sun.proxy.$Proxy98.execute(Unknown Source)
        at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1135)
        at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1175)
        at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:124)
        at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:93)
        at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:80)
        at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: hudson.plugins.git.GitException: Command "git init /home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-aix-ppc64-openj9" returned status code 255:
stdout: 
stderr: exec(): 0509-036 Cannot load program git because of the following errors:
    0509-150   Dependent module /usr/lib/libiconv.a(libiconv.so.2) could not be loaded.
    0509-152   Member libiconv.so.2 is not found in archive 

    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2318)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2248)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2244)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1777)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$5.execute(CliGitAPIImpl.java:880)
    ... 11 more
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // stage
[Pipeline] echo
Execution error: Error cloning remote repo 'origin'
[Pipeline] End of Pipeline
Finished: FAILURE
sxa commented 5 years ago

The previews error message is only happening when git is executed from java (in this case the jenkins agent process) because java is adding /usr/lib into the LIBPATH which is stopping git from picking up the desired version of the library (the one in /opt/freeware/lib)

sxa commented 5 years ago

The git installed on the new machine is 2.20. The version on the older machines is 2.8.1. I have copied the git rpm from other machine (from /var/cache/yum/AIX_Toolbox/packages) and replaced the one on the new machine with the old version (rpm -e git; rpm -ivh /opt/.../git-2.8.1-1.aix6.1.ppc.rpm) and I believe that will rectify the problem

sxa commented 5 years ago

Jenkins agent is having consistent issues when using AdoptOpenJDK 8u222 OpenJ9 build:


Running on test-osuosl-ppc64-aix-71-1 in /home/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-aix-ppc64-openj9
[Pipeline] {
[Pipeline] stage
[Pipeline] { (build)
[Pipeline] checkout
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // stage
[Pipeline] echo
Execution error: java.io.IOException: Unexpected termination of the channel
[Pipeline] End of Pipeline
Finished: FAILURE

I have switched it to use the 64-bit IBM java8 build from https://developer.ibm.com/javasdk/support/aix-download-service/ for now and it appears to be progressing ok. The above log was from https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-aix-ppc64-openj9/301/console - 302 is being run with this version:

# ./java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 8.0.5.37 - pap6480sr5fp37-20190618_01(SR5 FP37))
IBM J9 VM (build 2.9, JRE 1.8.0 AIX ppc64-64-Bit Compressed References 20190617_419755 (JIT enabled, AOT enabled)
OpenJ9   - 354b31d
OMR      - 0437c69
IBM      - 4972efe)
JCL - 20190606_01 based on Oracle jdk8u211-b25
# 
sxa commented 5 years ago

wget is hitting the same problem that git had. Will add /opt/freeware/lib to the start of the LIBPATH in the aix.sh build environment script in order to compensate.

sxa commented 5 years ago

Building at https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-aix-ppc64-openj9/303/console - will let it progress overnight

sxa commented 5 years ago

The new machines are noticeably faster when running builds - about 30% even without running from a ramdisk. Once they are both fully set up and verified to work I may use the new ones for building and assign the two older ones to test

sxa commented 5 years ago

It's throwing an OutOfMemory error running the agent using the adoptopenjdk builds by default. Adjusting Advanced options in the machine definition's "Launch Method" section for the agent startup to have:

JavaPath: /usr/jdk8u222-b04/bin/java
JVM Options: -Xmx1024m

Also temporarily tried increasing rss for the jenkins user to 32Gb (67108864) in /etc/security/limits but it didn't help. Current ulimit values are as follows:

$ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) 131072
file size               (blocks, -f) unlimited
max memory size         (kbytes, -m) 32768
open files                      (-n) unlimited
pipe size            (512 bytes, -p) 64
stack size              (kbytes, -s) 32768
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited

I have switched back to using the IBM J9 VM for now (which is what we use on the two existing AIX systems). My understanding is that the OpenJ9 project is using an AdoptOpenJDK VM on some of their AIX machines (I've tried 8u181 and 8u222 with the same failures). @AdamBrousseau @jdekonin any idea what might be different on your systems that you have at OpenJ9 which are running the agent with a non-IBM JRE?

sxa commented 5 years ago

Hitting separate issues beyond that with the filesystem building jdk8u - need to verify if this is specific to the machine or not since jdk11u built ok on this machine.

## Starting jdk
find: 0652-019 The status on /home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-aix-ppc64-hotspot/workspace/build/src/build/aix-ppc64-normal-server-release/hotspot/dist/lib is not valid.
gmake[2]: *** No rule to make target '/home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-aix-ppc64-hotspot/workspace/build/src/build/aix-ppc64-normal-server-release/corba/dist/lib/classes.jar', needed by '/home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-aix-ppc64-hotspot/workspace/build/src/build/aix-ppc64-normal-server-release/jdk/classes/_the.CORBA.classes.imported'.  Stop.
gmake[1]: *** [BuildJdk.gmk:51: import-only] Error 2
gmake: *** [/home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-aix-ppc64-hotspot/workspace/build/src//make/Main.gmk:117: jdk-only] Error 2
sxa commented 5 years ago

OpenJ9 build job on the new machine is throwing this failure:

gmake[4]: *** No rule to make target '/home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-aix-ppc64-openj9/workspace/build/src/build/aix-ppc64-normal-server-release/vm/compiler/../omr/compiler/p/codegen/OMRInstOpCode.cpp', needed by '/home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-aix-ppc64-openj9/workspace/build/src/build/aix-ppc64-normal-server-release/vm/compiler/../objs/omr/compiler/p/codegen/OMRInstOpCode.o'.  Stop.
gmake[4]: *** Waiting for unfinished jobs....
gmake[3]: *** [makefile:69: default] Error 2
gmake[2]: *** [makefile:1078: j9jitlauncher] Error 2
gmake[1]: *** [/home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-aix-ppc64-openj9/workspace/build/src/closed/OpenJ9.gmk:439: build-j9] Error 2

Doesn't occur on the existing build machines. Now trying to reproduce but pinging @pshipton in case this has been seen elsewhere and may be a known transient iissue (Couldn't find a reference to it in any issue elsewhere)

sxa commented 5 years ago

Text::CSV wasn't found by the test suite:

17:05:58  perl configure.pl
17:05:58  cd /home/jenkins/workspace/Test_openjdk8_j9_special.functional_ppc64_aix/openjdk-tests/TestConfig/scripts/testKitGen; \
17:05:58  perl testKitGen.pl --graphSpecs=aix_ppc-64_cmprssptrs --jdkVersion=8 --impl=openj9 --buildList=functional --iterations=1 --testFlag= ; \
17:05:58  cd /home/jenkins/workspace/Test_openjdk8_j9_special.functional_ppc64_aix/openjdk-tests/TestConfig;
17:06:00  Can't locate Text/CSV.pm in @INC (you may need to install the Text::CSV module) (@INC contains: ./makeGenTool /opt/freemarker/lib/perl5 /opt/freeware/lib/perl5/site_perl/5.28.1/ppc-aix-thread-multi /opt/freeware/lib/perl5/site_perl/5.28.1 /opt/freeware/lib/perl5/5.28.1/ppc-aix-thread-multi /opt/freeware/lib/perl5/5.28.1 /opt/freeware/lib/perl5/site_perl) at makeGenTool/parseFiles.pl line 27.
17:06:00  BEGIN failed--compilation aborted at makeGenTool/parseFiles.pl line 27.
17:06:00  Compilation failed in require at makeGenTool/mkgen.pl line 93.
17:06:00  Using projectRootDir: /home/jenkins/workspace/Test_openjdk8_j9_special.functional_ppc64_aix/openjdk-tests/TestConfig/scripts/testKitGen/../../..
17:06:00  Getting modes data from modes.xml and ottawa.csv...
17:06:00  gmake[1]: Leaving directory '/home/jenkins/workspace/Test_openjdk8_j9_special.functional_ppc64_aix/openjdk-tests/TestConfig'
17:06:00  makefile:39: count.mk: A file or directory in the path name does not exist.
17:06:00  gmake: *** No rule to make target 'count.mk'.  Stop.

The module is under /opt/freeware/lib/perl51 but not /opt/freeware/lib/perl5/5.28.1 - will symlink it under the Text directory in the place it's currently looking for now. and rerun: Failing run: https://ci.adoptopenjdk.net/job/Test_openjdk8_j9_special.functional_ppc64_aix/10/console Re-run: https://ci.adoptopenjdk.net/job/Test_openjdk8_j9_special.functional_ppc64_aix/11/console

sxa commented 5 years ago

Another odd random failure showing up next time round on the hotspot build (324)

Running ddrgen to generate j9ddr.dat and superset.dat
Blob written to file: ../j9ddr.dat
Superset written to file: ../superset.dat
## Starting corba
Compiling 6 files for BUILD_LOGUTIL
/home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-aix-ppc64-openj9/workspace/build/src/corba/src/share/classes/com/sun/tools/corba/se/logutil/IndentingPrintWriter.java:35: error: cannot access Object
public class IndentingPrintWriter extends PrintWriter {
       ^
  class file for java.lang.Object not found
/home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-aix-ppc64-openj9/workspace/build/src/corba/src/share/classes/com/sun/tools/corba/se/logutil/IndentingPrintWriter.java:38: error: cannot find symbol
    private String indentString = "" ;
            ^
  symbol:   class String
  location: class IndentingPrintWriter
sxa commented 5 years ago

XML::Parser CPAN module has also failed to install so I'll have to remove the test tags from the machine for now. The original machines are using a version of perl installed under /usr from the AIX perl.rte package (version 5.10.1.250) as opposed to the 5.28.1 installed via an RPM. The new machine also has /opt/freeware/bin at the start of the PATH which makes it pick up that version first ... Now trying to remove all that and install Text::CSV and XML::Parser into the system perl EDIT: Multiple linkage failures when I try that

sxa commented 5 years ago

Have removed the test tag from the new machine, but also removed build from build-osuosl-ppc64-aix-71-1 for now in order to leave the latter dedicated to test since we have two mostly working build machines.

sxa commented 5 years ago

This machine is still giving various build failures which I haven't yet been able to fully understand and diagnose.

sxa commented 4 years ago

I have another two AIX boxes from another source now available but aren't yet set up for our needs, but will be looking at getting them installed with a level suitable for the OpenJ9 folks as per #1006

sxa commented 4 years ago

Related: https://github.com/AdoptOpenJDK/openjdk-tests/issues/1538

sxa commented 4 years ago

AIX 7.1TL5SP5 at IBM PCC: b9s010a@p159a02.centers.ihost.com AIX 7.2 at OSUOSL: 140.211.9.36 There are some issues with the first AIX 7.2 system as per @smlambert's comments on slack which I will repeat here:

shelley.lambert 12:56 AM
I've removed the test tag from test-osuosl-ppc64-aix-71-1, as there seems to be a couple of issues remaining https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1649/console:
• building openjdk tests fail, looks like executing the tar command at https://github.com/AdoptOpenJDK/openjdk-tests/blob/master/openjdk/build.xml#L38
• then later when archiving results, tar fails to run (the version on that machine does not seem to recognize the z flag)

Ref: https://adoptopenjdk.slack.com/archives/C53GHCXL4/p1578531390044000?thread_ts=1578516768.037100

sej-jackson commented 4 years ago

OK, p159a02.centers.ihost.com has been run through the AIX playbook (with a little hand-holding), and should hopefully be ready to try.
I've added the installp for aixtools.git, and a wrapper for wget (/opt/freeware/bin/wget_64_fix), which should deal with the known issues with libiconv.a, and I've deliberately renamed the /opt/freeware/bin/basename symlink (to basename.freeware) because it upsets xlc.

Let me know if you run into any problems, and I'll take a look on Monday.

The 2nd machine (140.211.9.36) needs python and yum before I can even get started, so I'll get those sorted out on Monday too.

sxa commented 4 years ago

The default git in the path on 129.33.196.210 (Second AIX71TL5SP5 system) was causing problems (/usr/bin/git was symlinked to /opt/freeware/bin/git). I have removed the rpm-installed git (rpm -e git) and set the symlink to point to the installp one (ln -s /opt/bin/git /usr/bin/git) to resolve, although it needed a further update as we have an outdated cacerts so git can't validate github.com's certificate

Re-testing a jdk_math at https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1943/ And extended.system at https://ci.adoptopenjdk.net/view/Test_system/job/Test_openjdk11_hs_extended.system_ppc64_aix/8

https://ci.adoptopenjdk.net/view/Test_system/job/Test_openjdk11_hs_extended.system_ppc64_aix/6/

sxa commented 4 years ago

Also on the second AIX71TL5SP5 system I've had to add libiconv.so.2 to /usr/lib/libiconv.a as follows. Without it jenkins was failing to run shell scripts properly e.g.

[Pipeline] sh
19:17:18  process apparently never started in /home/jenkins/workspace/Grinder@tmp/durable-3e045e9b
19:17:18  (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
[Pipeline] }

Steps as follows:

mkdir /tmp/mylibiconv
cd /tmp/mylibiconv
cp /usr/lib/libiconv.a .
ar -X32 x /opt/freeware/lib/libiconv.a libiconv.so.2
ar -X32 r ../libiconv.a libiconv.so.2
rm libiconv.so.2
ar -X64 x /opt/freeware/lib/libiconv.a libiconv.so.2
ar -X64 r libiconv.a libiconv.so.2
mv /usr/lib/libiconv.a /usr/lib/libiconv.a-DIST.$$ && mv libiconv.a /usr/lib/libiconv.a
sej-jackson commented 4 years ago

140.211.9.36 is now done, but it was a bit of a struggle getting started because the python installed by yum.sh (using a March 2019 yum_bundle.tar - apparently the latest) couldn't load libintl.a, so yum itself wouldn't work.

Got around it by temporarily replacing both /usr/lib/libintl.a and /opt/freeware/lib/libintl.a with symlinks to /usr/opt/rpm/lib/libintl.a to get yum working in order to run yum update (manually - it wouldn't run from the playbook).

The updates refreshed /opt/freeware/lib/libintl.a with a newer version and redirected /usr/lib/libintl.a to link to it, but this broke yum again, so had to reset /usr/lib/libintl.a back to link to /usr/opt/rpm/lib/libintl.a.

As before, I've installed git from aixtools, added a wrapper for wget, and hidden the /opt/freeware/bin/basename symlink to stop xlc getting upset.

Crossing fingers and hoping it'll be ok.

sxa commented 4 years ago

140.211.9.36 (test-osuosl-ppc64-aix-72-2)[https://ci.adoptopenjdk.net/computer/test-osuosl-ppc64-aix-72-2) was throwing java.lang.OutOfMemoryError: native memory exhausted

I've resovled int by set the advanced options on the jenkins machine definition to have this as the Prefix Start Agent Command value:

export LDR_CNTRL=MAXDATA=0x80000000 &&
sxa commented 4 years ago

With the installation of xlc16 on build-osuosl-ppc64-aix-71-2 by me today we will move the JDK13+ builds onto there from the build-ibm- systems which can therefore be reallocated 100% for testing.

sxa commented 4 years ago

So to be clear ... these are the machines we now have (in theory, subject to final verification) for testing AIX:

name former name IP OS level
build-osuosl-aix71-ppc64-1 build-osuosl-ppc64-aix-71-1 140.211.9.10 7100-04
test-ibm-ppc64-aix-71-1 test-ibm-ppc64-aix-71-1 129.33.196.209 7100-05
test-ibm-ppc64-aix-71-2 build-ibm-ppc64-aix-71-1 129.33.196.210 7100-05
test-osuosl-aix72-ppc64-1 test-osuosl-ppc64-aix-72-1 140.211.9.28 7200-02
test-osuosl-aix72-ppc64-2 test-osuosl-ppc64-aix-72-2 140.211.9.36 7200-02

The above name changes (made in jenkins) brings the machines in line with the entries in inventory.yml after this is merged