Closed sophia-guo closed 4 years ago
Same issue for tests on s390x test-marist-ubuntu1604-s390x-2-XJ.
The s390x box should be ok now - let me know if there are any further isssues
Have now also installed Text::CSV
, XML::Parser
and JSON
onto the mac system
antlib is also missing on mac system. https://ci.adoptopenjdk.net/view/Test_openjdk/job/openjdk11_hs_openjdktest_x86-64_macos/225/console
disabled node: test-macincloud-macos1010-3-XJ
brew install ant-contrib
executed and linked /usr/local/Cellar/ant-contrib/1.0b3/share/ant/ant-contrib-1.0b3.jar
to /usr/local/Cellar/ant/1.10.1/lib/ant-contrib.jar
https://ci.adoptopenjdk.net/job/openjdk8_hs_openjdktest_x86-64_macos/373/ has passed the problematic section so the above appears to have worked.
@sxa555 I see that https://ci.adoptopenjdk.net/computer/test-macincloud-macos1010-3-XJ/ is still offline and jobs are waiting for available machines. Could you re-enable it? Thanks!
@sxa555 both jdk and system test jobs on https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1604-s390x-2-XJ/ are running with unexpected long time and extra errors.
System tests : running on https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1604-s390x-1/ pass, 2 hours running on https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1604-s390x-2-XJ/ failed, 4 hours https://ci.adoptopenjdk.net/view/Test_system/job/openjdk11_j9_systemtest_s390x_linux/
jdk Tests: running on https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1604-s390x-1/ failed, round 1.5 hours running on https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1604-s390x-2-XJ/ failed, 9 hours and time out https://ci.adoptopenjdk.net/view/Test_openjdk/job/openjdk11_j9_openjdktest_s390x_linux/
Similar issues for other version or implements.
Wondered any configuration difference between those two machines?
Please ensure that if an issue still persists on a closed issue that you reopen it or any comments will likely not be actioned.
I've taken the macos box back online.
For the s390x box are all the errors network timeouts? (I'm basing that on run 260 of the job you mentioned)
Unfortunately I don't have this repo's reopen permission :-(
Yes, most of failures are timeouts, which make the job take much longer time than on the other machine and make the build timeouts. I wondered if any configuration hidden issue?
My question was whether they were all network timeouts specifically - are they?
I'm not sure what "configuration hidden issue" is suggesting - if there's an issue we need to debug and identify it as I can't tell what's wrong at the moment :-)
We need to know what operations in particular are getting stuck to be able to debug this further
The following tests are failing on test-macincloud-macos1010-3-XJ but pass on test-macincloud-macos1010-1
java/util/prefs/AddNodeChangeListener.java.AddNodeChangeListener
java/util/prefs/CheckUserPrefsStorage.sh.CheckUserPrefsStorage
java/util/prefs/RemoveReadOnlyNode.java.RemoveReadOnlyNode
java/util/prefs/RemoveUnregedListener.java.RemoveUnregedListener
The prefs tests are the same ones as were failing here: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8079418. The underlying issue there was user permissions - but also that is now 'resolved'.
NOTE: mac machine has been renamed from test-macincloud-macos1010-3-XJ
to test-macstadium-macos1010-1-XJ
as the hosting provider was incorrect
@sxa555 can I close this as the machine has been deleted (https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/849)
@gdams No this should be kept open as this covers issues with more than just the macos machine (Thanks for not leaving this closed @karianna)
@sophia-guo @lumpfish As per earlier question are the failures on the s390x box all network timeouts? We need to get this understood and resolved as it seems to be the cause of a lot of zLinux slowness at the moment. Can someone who understands the test suite determine what specific operations are hanging on the machine?
I'm going to abort #13 on https://ci.adoptopenjdk.net/job/Test_openjdk8_j9_sanity.openjdk_s390x_linux/13/ for now so I can quiesce test-marist-ubuntu1604-s390x-2-XJ and see if there are any processes left around.
Answer: lots Ref: jenkins.maristXJ.log.gz
Here is a samsnippet of the ps listing with the July 29th stuck processes - 19 of them of which 16 were from a base openjdktest
run:
sxa@x220t:~$ gzip -cd jenkins.maristXJ.log.gz | grep Jun29 | cut -c-200 | grep openjdktest_s
jenkins 48465 0.0 0.2 2089944 23408 ? SLl Jun29 0:40 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 48514 0.0 0.2 2089944 23292 ? SLl Jun29 0:41 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 50493 0.0 0.2 2089944 22380 ? SLl Jun29 0:41 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 51539 0.0 0.2 2089944 23072 ? SLl Jun29 0:42 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 52199 0.0 0.2 2089944 23136 ? SLl Jun29 0:40 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 53244 0.0 0.2 2089944 23664 ? SLl Jun29 0:42 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 54032 0.0 0.2 2089944 23556 ? SLl Jun29 0:41 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 55486 0.0 0.1 2090200 14856 ? SLl Jun29 0:41 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 56161 0.0 0.2 2089944 22812 ? SLl Jun29 0:42 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 57244 0.0 0.2 2089944 22720 ? SLl Jun29 0:37 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 57923 0.0 0.2 2089944 22852 ? SLl Jun29 0:41 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 59612 0.0 0.1 2089944 14888 ? SLl Jun29 0:38 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 61204 0.0 0.1 2089944 14600 ? SLl Jun29 0:40 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 62194 0.0 0.1 2089944 14836 ? SLl Jun29 0:39 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 63804 0.0 0.1 2090200 14640 ? SLl Jun29 0:41 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins 64710 0.0 0.1 2089944 14556 ? SLl Jun29 0:42 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
(For the record, these process listings are also created regularly and are visible at https://ci.adoptopenjdk.net/job/SXA-processCheck/label=test-marist-ubuntu1604-s390x-2-XJ/)
I have cleared out the processes (close to 100 of them), re-enabled the executor and https://ci.adoptopenjdk.net/job/Test_openjdk13_j9_sanity.system_s390x_linux/7/ is the first job to get scheduled on it
FYI @smlambert
Hmm, processes from back on Jun29.
Possibly related: https://github.com/AdoptOpenJDK/openjdk-tests/issues/1071 https://github.com/AdoptOpenJDK/openjdk-tests/issues/1051
Jun29 was just me attempting to show a sample snapshot from a random day :-) Thanks for those two links - I figured you might have some other issues on this somewhere so great to have them all linked now. Not all of the hung processes were from Hotspot runs but they could have been the trigger for others failing.
@sxa555 For JDK tests yes, almost failing tests (around 110) are rmi, nio, net group. The error message is either ' timeout ' or 'Cannot assign requested address' (which is assign a network address). https://ci.adoptopenjdk.net/view/Test_openjdk/job/Test_openjdk8_hs_sanity.openjdk_s390x_linux/17/#showFailuresLink
SInce the original Marist machines have now been decomissioned, both -XJ
machines that this issue refers to are no longer in the test machine set, therefore closing
Test running on https://ci.adoptopenjdk.net/computer/test-macincloud-macos1010-3-XJ/ failed with message:
https://ci.adoptopenjdk.net/view/Test_openjdk/job/openjdk8_j9_openjdktest_x86-64_macos/217/consoleFull
Text/CSV.pm is required for running Testkitgen