eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 722 forks source link

mauve system tests hanging on osx #7050

Open pshipton opened 5 years ago

pshipton commented 5 years ago

Perhaps a dup of https://github.com/eclipse/openj9/issues/4091 update: #4091 is closed as a dup of this one. Seems all Mauve sanity.system tests are excluded on osx.

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_x86-64_mac_Nightly/51 osx1014-x86-2 All mauve testing failed System dumps are created, but there is no archive attached to the job. A grinder of a single sub-test might include diagnostics

00:35:49  LT  00:35:46.352 - Starting thread. Suite=0 thread=0
00:35:49  LT  00:35:46.353 - Starting thread. Suite=0 thread=1
00:35:49  LT  00:35:46.353 - Starting thread. Suite=0 thread=2
00:35:49  LT  00:35:46.353 - Starting thread. Suite=0 thread=3
00:35:49  LT  00:35:46.353 - Starting thread. Suite=0 thread=4
00:35:49  LT  00:35:46.353 - Starting thread. Suite=0 thread=5
00:35:49  LT  00:35:46.353 - Starting thread. Suite=0 thread=6
00:35:49  LT  00:35:46.355 - Starting thread. Suite=0 thread=7
00:35:49  LT  00:35:46.356 - Starting thread. Suite=0 thread=8
00:35:49  LT  00:35:46.356 - Starting thread. Suite=0 thread=9
00:35:49  LT  00:35:46.356 - Starting thread. Suite=0 thread=10
00:35:49  LT  stderr _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
00:35:49  LT  stderr 2019-09-11 00:35:46.622 java[39481:24524635] +[NSXPCSharedListener endpointForReply:withListenerName:]: an error occurred while attempting to obtain endpoint for listener 'ClientCallsAuxiliary': Connection interrupted
00:36:09  LT  00:36:06.405 - Completed 0.1%. Number of tests started=2444
00:36:28  LT  00:36:26.357 - Completed 0.1%. Number of tests started=2444 (+0)
00:36:48  LT  00:36:46.449 - Completed 0.1%. Number of tests started=2444 (+0)
00:37:08  LT  00:37:06.451 - Completed 0.1%. Number of tests started=2444 (+0)
00:37:27  LT  00:37:26.436 - Completed 0.1%. Number of tests started=2444 (+0)
00:37:47  LT  00:37:46.392 - Completed 0.1%. Number of tests started=2444 (+0)
00:38:07  LT  00:38:06.379 - Completed 0.1%. Number of tests started=2444 (+0)
00:38:29  LT  00:38:26.375 - Completed 0.1%. Number of tests started=2444 (+0)
00:38:47  LT  00:38:46.366 - Completed 0.1%. Number of tests started=2444 (+0)
00:39:07  LT  00:39:06.419 - Completed 0.1%. Number of tests started=2444 (+0)
00:39:26  LT  00:39:26.372 - Completed 0.1%. Number of tests started=2444 (+0)
00:39:49  LT  00:39:46.417 - Completed 0.1%. Number of tests started=2444 (+0)
00:40:09  LT  00:40:06.384 - Completed 0.1%. Number of tests started=2444 (+0)
00:40:26  LT  00:40:26.452 - Completed 0.1%. Number of tests started=2444 (+0)
00:40:46  STF 00:40:45.361 - Heartbeat: Process LT  is still running
00:40:47  LT  00:40:46.447 - Completed 0.1%. Number of tests started=2444 (+0)
00:41:07  LT  00:41:06.377 - Completed 0.1%. Number of tests started=2444 (+0)
00:41:27  LT  00:41:26.354 - Completed 0.1%. Number of tests started=2444 (+0)
00:41:46  LT  00:41:46.368 - Completed 0.1%. Number of tests started=2444 (+0)
00:42:10  LT  00:42:06.396 - Completed 0.1%. Number of tests started=2444 (+0)
00:42:27  LT  00:42:26.356 - Completed 0.1%. Number of tests started=2444 (+0)
00:42:50  LT  00:42:46.384 - Completed 0.1%. Number of tests started=2444 (+0)
00:43:09  LT  00:43:06.413 - Completed 0.1%. Number of tests started=2444 (+0)
00:43:29  LT  00:43:26.395 - Completed 0.1%. Number of tests started=2444 (+0)
00:43:48  LT  00:43:46.431 - Completed 0.1%. Number of tests started=2444 (+0)
00:44:07  LT  00:44:06.362 - Completed 0.1%. Number of tests started=2444 (+0)
00:44:27  LT  00:44:26.454 - Completed 0.1%. Number of tests started=2444 (+0)
00:44:46  LT  00:44:46.358 - Completed 0.1%. Number of tests started=2444 (+0)
00:45:09  LT  00:45:06.384 - Completed 0.1%. Number of tests started=2444 (+0)
00:45:28  LT  00:45:26.436 - Completed 0.1%. Number of tests started=2444 (+0)
00:45:45  STF 00:45:45.408 - Heartbeat: Process LT  is still running
00:45:47  LT  00:45:46.358 - Completed 0.1%. Number of tests started=2444 (+0)
00:46:09  LT  00:46:06.357 - Completed 0.1%. Number of tests started=2444 (+0)
00:46:29  LT  00:46:26.375 - Completed 0.1%. Number of tests started=2444 (+0)
00:46:49  LT  00:46:46.351 - Completed 0.1%. Number of tests started=2444 (+0)
00:47:09  LT  00:47:06.386 - Completed 0.1%. Number of tests started=2444 (+0)
00:47:28  LT  00:47:26.416 - Completed 0.1%. Number of tests started=2444 (+0)
00:47:48  LT  00:47:46.355 - Completed 0.1%. Number of tests started=2444 (+0)
00:48:07  LT  00:48:06.365 - Completed 0.1%. Number of tests started=2444 (+0)
00:48:28  LT  00:48:26.369 - Completed 0.1%. Number of tests started=2444 (+0)
00:48:47  LT  00:48:46.394 - Completed 0.1%. Number of tests started=2444 (+0)
00:49:08  LT  00:49:06.377 - Completed 0.1%. Number of tests started=2444 (+0)
00:49:27  LT  00:49:26.376 - Completed 0.1%. Number of tests started=2444 (+0)
00:49:47  LT  00:49:46.358 - Completed 0.1%. Number of tests started=2444 (+0)
00:50:09  LT  00:50:06.357 - Completed 0.1%. Number of tests started=2444 (+0)
00:50:29  LT  00:50:26.418 - Completed 0.1%. Number of tests started=2444 (+0)
00:50:46  STF 00:50:45.274 - Heartbeat: Process LT  is still running
00:50:47  LT  00:50:46.360 - Completed 0.1%. Number of tests started=2444 (+0)
00:50:48  LT  00:50:47.350 - **POSSIBLE HANG DETECTED**
00:50:48  STF 00:50:47.821 - **FAILED** Process LT  has hung
07:45:58  To rebuild the failed test in a jenkins job, copy the following link and fill out the <Jenkins URL> and <FAILED test target>:
07:45:58  <Jenkins URL>/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=<FAILED test target>
07:45:58  
07:45:58  For example, to rebuild the failed tests in <Jenkins URL>=https://ci.adoptopenjdk.net/job/Grinder, use the following links:
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveMultiThreadLoadTest_special_15
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveMultiThreadLoadTest_special_16
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveMultiThreadLoadTest_special_17
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveMultiThreadLoadTest_special_18
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveMultiThreadLoadTest_special_19
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveMultiThreadLoadTest_special_20
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveMultiThreadLoadTest_special_21
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveMultiThreadLoadTest_special_22
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveMultiThreadLoadTest_special_23
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveMultiThreadLoadTest_special_24
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveMultiThreadLoadTest_special_25
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveMultiThreadLoadTest_special_28
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveMultiThreadLoadTest_special_29
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveSingleInvocationLoadTest_special_15
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveSingleInvocationLoadTest_special_16
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveSingleInvocationLoadTest_special_17
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveSingleInvocationLoadTest_special_18
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveSingleInvocationLoadTest_special_19
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveSingleInvocationLoadTest_special_20
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveSingleInvocationLoadTest_special_21
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveSingleInvocationLoadTest_special_22
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveSingleInvocationLoadTest_special_23
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveSingleInvocationLoadTest_special_24
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveSingleInvocationLoadTest_special_25
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveSingleInvocationLoadTest_special_28
07:45:58  https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system/mauveLoadTest&JenkinsFile=openjdk_x86-64_mac&TARGET=MauveSingleInvocationLoadTest_special_29
pshipton commented 5 years ago

Trying a grinder on MauveMultiThreadLoadTest_special_15 https://ci.eclipse.org/openj9/view/Test/job/Test-Grinder/498/ osx1013-x86-3 The cores seem to be in the grinder job, so we can get them from there if needed, or easily recreate the problem.

pshipton commented 5 years ago

@Mesbah-Alam @smlambert @DanHeidinga

DanHeidinga commented 5 years ago

00:35:49 LT stderr _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL. 00:35:49 LT stderr 2019-09-11 00:35:46.622 java[39481:24524635] +[NSXPCSharedListener endpointForReply:withListenerName:]: an error occurred while attempting to obtain endpoint for listener 'ClientCallsAuxiliary': Connection interrupted

Looks like this stackoverflow question: https://stackoverflow.com/a/15202453/5029186

pshipton commented 5 years ago

@AdamBrousseau

AdamBrousseau commented 5 years ago
07:46:25  + tar -zcf system_mauveLoadTest_test_output.tar.gz ./openjdk-tests/TestConfig/test_output_15681755186622
[Pipeline] getArtifactoryServer
[Pipeline] newBuildInfo
[Pipeline] artifactoryUpload (hide)
expected to call org.jfrog.hudson.pipeline.common.types.ArtifactoryServer.upload but wound up catching artifactoryUpload; see: https://jenkins.io/redirect/pipeline-cps-method-mismatches/
08:12:12  [consumer_0] Deploying artifact: https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Test/Test_openjdk8_j9_special.system_x86-64_mac_Nightly/51/system_mauveLoadTest_test_output.tar.gz
Cancelling nested steps due to timeout
[Pipeline] }
[Pipeline] // timestamps
10:12:48  Error occurred for request PUT /artifactory/ci-eclipse-openj9/Test/Test_openjdk8_j9_special.system_x86-64_mac_Nightly/51/system_mauveLoadTest_test_output.tar.gz;build.parentNumber=46;build.parentName=Test_openjdk8_j9_special.system_x86-64_mac_Nightly;build.timestamp=1568170859606;build.name=Test_openjdk8_j9_special.system_x86-64_mac_Nightly;build.number=51 HTTP/1.1: Connection closed by remote host.
10:12:48  Attempting retry #1
10:12:48  [consumer_0] An exception occurred during execution:
10:12:48  java.lang.IllegalStateException: Connection pool shut down

Any idea how big the test tarball is? Looking at the grinder you linked the one there is 400M and takes 1 minute to zip. This one appears to take 1/2 hour just to zip. Then fails to upload after 2 hours.

pshipton commented 5 years ago

@AdamBrousseau I don't care about the artifact, more concerned about the root problem https://github.com/eclipse/openj9/issues/7050#issuecomment-530508632

AdamBrousseau commented 5 years ago

Not really understanding this. I tried running that StackOverflow suggestion but there's nothing in /Library/LaunchDaemons. I think we need more help from Test in order to understand what is going on here.

pshipton commented 5 years ago

The mauve tests passed when running on osx1014-x86-3. https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_x86-64_mac_Nightly/78/

Mesbah-Alam commented 5 years ago

Similar failures continue to show up:

LT  stderr _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
LT  stderr 2019-09-23 06:30:25.447 java[61166:26134165] +[NSXPCSharedListener endpointForReply:withListenerName:]: an error occurred while attempting to obtain endpoint for listener 'ClientCallsAuxiliary': Connection interrupted

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_x86-64_mac_Nightly/143/tapResults/

pshipton commented 5 years ago

@Mesbah-Alam note that as per https://github.com/eclipse/openj9/issues/7050#issuecomment-531560606 it passed when running on osx1014-x86-3. Not sure what is special about this machine. @AdamBrousseau @jdekonin

jdekonin commented 5 years ago

There is nothing special about this machine. It should identical to osx1014-x86-1 and osx1014-x86-2.

pshipton commented 5 years ago

I wonder if it passes on osx1014 machines. I'll keep of list of where it fails. osx1011-x86-1 osx1011-x86-2 osx1013-x86-3 osx1013-x86-4 osx1014-x86-1 osx1014-x86-2

and where it passes osx1014-x86-3

jdekonin commented 5 years ago

The 1013 and 1011 systems should all be identical as well.

pshipton commented 5 years ago

I tried grinders on osx1014-x86-1 and 2, and the tests do not pass on these machines. There is something special about osx1014-x86-3.

pshipton commented 5 years ago

This seems like a machine issue or test issue. It won't block the release so it doesn't need to be in the 0.17 milestone plan, moving it out. Although I would like it to be resolved sooner rather than later since the mauve tests fail on Mac almost every night.

vsebe commented 4 years ago

The mauveLoadTest tests always fail (on internal, external openj9 and adopt machines):

pshipton commented 4 years ago

@vsebe note they (used to) pass on a single machine, osx1014-x86-3

vsebe commented 4 years ago

Running a Grinder on osx1014-x86-3

12:56:29  LT  stderr _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
12:56:29  LT  stderr 2019-11-27 12:56:27.792 java[46488:1391486] +[NSXPCSharedListener endpointForReply:withListenerName:]: an error occurred while attempting to obtain endpoint for listener 'ClientCallsAuxiliary': Connection interrupted
12:56:48  LT  12:56:47.503 - Completed 0.2%. Number of tests started=4426
...
13:11:28  LT  13:11:27.431 - Completed 0.2%. Number of tests started=4426 (+0)
13:11:29  LT  13:11:28.383 - **POSSIBLE HANG DETECTED**
13:11:29  STF 13:11:28.773 - **FAILED** Process LT  has hung
13:11:29  STF 13:11:28.773 - Collecting dumps for: LT 
13:11:29  STF 13:11:28.773 - Sending SIG 3 to the java process to generate a javacore
13:11:29  STF 13:11:28.773 - Running command: kill -3 46488
...
13:13:29  **FAILED** at step 1 (Run Mauve load test). Expected return value=0 Actual=1 at /Users/jenkins/workspace/Grinder/openjdk-tests/TestConfig/../TestConfig/test_output_15748773846700/MauveMultiThreadLoadTest_special_13/20191127-125625-MauveMultiThreadLoadTest/execute.pl line 95.
13:13:29  STF 13:13:29.299 - **FAILED** execute script failed. Expected return value=0 Actual=1
...
pshipton commented 4 years ago

We managed to cause the same problem RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL. in a sanity.functional test by loading the awt library.

pshipton commented 4 years ago

@Mesbah-Alam if the awt library is loaded from the main thread first, then this test may be able to run on Mac Java 8.

Mesbah-Alam commented 4 years ago

Checking the inventory files for this load test, there is no awt test that gets run in the multi-threaded version of the mauve test (They area all commented out - as "swing and awt are not thread safe").

Awt tests are run only for the single-threaded version of mauve tests -- Does this not mean that the "awt libraries" should always be loaded from the main thread first?

@pshipton

pshipton commented 4 years ago

@Mesbah-Alam just trying to figure out why these tests are hanging. We might need to experiment. I know loading the awt library causes the hang. I've heard that awt needs to be loaded in the main thread in order for it to work. We could try pre-loading it, or we can try to track down why it is loaded and exclude that sub-set. Setting an -Xtrace option on System.loadLibrary() can help find the callers.

This will dump a stack trace to stderr. "-Xtrace:none,iprint=mt,trigger=method{java/lang/System.load*,jstacktrace}"

For reference, there is the latest run https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_x86-64_mac_Nightly/545/ All the enabled mauve tests fail

06:16:55    MauveMultiThreadLoadTest_special_13
06:16:55    MauveMultiThreadLoadTest_special_14
06:16:55    MauveMultiThreadLoadTest_special_15
06:16:55    MauveMultiThreadLoadTest_special_16
06:16:55    MauveMultiThreadLoadTest_special_17
06:16:55    MauveMultiThreadLoadTest_special_18
06:16:55    MauveMultiThreadLoadTest_special_19
06:16:55    MauveMultiThreadLoadTest_special_20
06:16:55    MauveMultiThreadLoadTest_special_21
06:16:55    MauveMultiThreadLoadTest_special_22
06:16:55    MauveMultiThreadLoadTest_special_25
06:16:55    MauveMultiThreadLoadTest_special_26
06:16:55    MauveSingleInvocationLoadTest_special_13
06:16:55    MauveSingleInvocationLoadTest_special_14
06:16:55    MauveSingleInvocationLoadTest_special_15
06:16:55    MauveSingleInvocationLoadTest_special_16
06:16:55    MauveSingleInvocationLoadTest_special_17
06:16:55    MauveSingleInvocationLoadTest_special_18
06:16:55    MauveSingleInvocationLoadTest_special_19
06:16:55    MauveSingleInvocationLoadTest_special_20
06:16:55    MauveSingleInvocationLoadTest_special_21
06:16:55    MauveSingleInvocationLoadTest_special_22
06:16:55    MauveSingleInvocationLoadTest_special_25
06:16:55    MauveSingleInvocationLoadTest_special_26
pshipton commented 4 years ago

Mesbah captured a stack trace for a test loading awt.

14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [1] java.lang.System.loadLibrary (System.java:558)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [2] java.awt.Toolkit$3.run (Toolkit.java:1636)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [3] java.awt.Toolkit$3.run (Toolkit.java:1634)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [4] java.security.AccessController.doPrivileged (AccessController.java:678) (Compiled Code)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [5] java.awt.Toolkit.loadLibraries (Toolkit.java:1633)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [6] java.awt.Toolkit.<clinit> (Toolkit.java:1670)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [7] java.awt.Color.<clinit> (Color.java:275)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [8] gnu.testlet.java.text.AttributedCharacterIterator.getRunLimit.test1 (getRunLimit.java:51)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [9] gnu.testlet.java.text.AttributedCharacterIterator.getRunLimit.test (getRunLimit.java:41)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [10] sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [11] sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [12] sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) (Compiled Code)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [13] java.lang.reflect.Method.invoke (Method.java:498) (Compiled Code)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [14] net.adoptopenjdk.loadTest.adaptors.MauveAdaptor.executeTest (MauveAdaptor.java:74) (Compiled Code)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [15] net.adoptopenjdk.loadTest.LoadTestRunner$2.run (LoadTestRunner.java:182)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [16] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [17] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
14:04:18  LT  stderr 19:04:17.104 0x7b05100       j9trc_aux.1          - [18] java.lang.Thread.run (Thread.java:821)

The test causing the problem is gnu.testlet.java.text.AttributedCharacterIterator.getRunLimit.test1

Mesbah-Alam commented 4 years ago

Excluded getRunLimit and hit a second one that loads awt gnu.testlet.java.text.AttributedCharacterIterator.getRunStart :

14:34:03  LT  stderr 19:34:02.842 0xa279e00       j9trc_aux.1          - [2] java.awt.Toolkit$3.run (Toolkit.java:1636)
14:34:03  LT  stderr 19:34:02.842 0xa279e00       j9trc_aux.1          - [3] java.awt.Toolkit$3.run (Toolkit.java:1634)
14:34:03  LT  stderr 19:34:02.842 0xa279e00       j9trc_aux.1          - [4] java.security.AccessController.doPrivileged (AccessController.java:678) (Compiled Code)
14:34:03  LT  stderr 19:34:02.842 0xa279e00       j9trc_aux.1          - [5] java.awt.Toolkit.loadLibraries (Toolkit.java:1633)
14:34:03  LT  stderr 19:34:02.842 0xa279e00       j9trc_aux.1          - [6] java.awt.Toolkit.<clinit> (Toolkit.java:1670)
14:34:03  LT  stderr 19:34:02.842 0xa279e00       j9trc_aux.1          - [7] java.awt.Color.<clinit> (Color.java:275)
14:34:03  LT  stderr 19:34:02.842 0xa279e00       j9trc_aux.1          - [8] gnu.testlet.java.text.AttributedCharacterIterator.getRunStart.test1 
Mesbah-Alam commented 4 years ago

As I exclude more tests, more new tests are being exposed as culprits. I'm going through the process of compiling the total list of excludes. Will update this issue when I reach the bottom of this list and get a green run with no hang.

pshipton commented 4 years ago

We should explore pre-loading awt in the main thread rather than excluding too many tests.

Mesbah-Alam commented 4 years ago

@pshipton , for pre-loading awt in the main thread, would this suffice?

https://github.com/AdoptOpenJDK/openjdk-systemtest/compare/master...Mesbah-Alam:preload-awt-lib-in-main-thread-for-mauve-test?expand=1

I tried the above change, and was able to reproduce the error even before any thread / mauve test started:

11:11:36  GEN stderr _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
11:11:36  GEN stderr 2019-12-18 11:11:36.150 java[95094:73409151] +[NSXPCSharedListener endpointForReply:withListenerName:]: an error occurred while attempting to obtain endpoint for listener 'ClientCallsAuxiliary': Connection interrupted

Link: Test_grinder/job/Grinder/4971/console

pshipton commented 4 years ago

I'm not sure how to fix it. It was @keithc-ca who suggested that awt needs to be loaded in the main thread. You should record here the experiments you tried.

Mesbah-Alam commented 4 years ago
public class A1 {
    static {
        try {
            Class.forName("java.awt.Toolkit");
        } catch (ClassNotFoundException e) {
            e.printStackTrace();            
                } 
    }
    public static void main (String [] args) {
        System.out.print("Hello world");
    }
}
public class A2 {
    public static void main (String [] args) {
        Thread t = new T();
        t.start();
    }
}

public class T extends Thread {
    static {
        try {
            Class.forName("java.awt.Toolkit");
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } 
    }

    public void run () {
        System.out.println("Mr. T");
    }
}

A1 and A2 were executed using OpenJ9 and Hotspot JDK8. They both work fine - i.e. the issue is not reproduced.

pshipton commented 4 years ago

@Mesbah-Alam for A2 please try moving the Class.forName into the run() method. Doing it in the static block means it's run in the main thread.

Mesbah-Alam commented 4 years ago

Done. Same result -- it works fine:

public class T extends Thread{
    public void run () {
        try {
            Class.forName("java.awt.Toolkit");
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } 

        System.out.println("Mr. T");
    }
}
Mesbah-Alam commented 4 years ago

Found the full set of files to be disabled for osx openj9 jdk8:

    <mauve class="gnu.testlet.java.text.AttributedCharacterIterator.getRunLimit"/>
    <mauve class="gnu.testlet.java.text.AttributedCharacterIterator.getRunStart"/>
    <mauve class="gnu.testlet.javax.swing.ViewportLayout.minimumLayoutSize"/>
    <mauve class="gnu.testlet.java.net.DatagramPacket.DatagramPacketTest2"/>
    <mauve class="gnu.testlet.java.text.AttributedString.getIterator"/>
    <mauve class="gnu.testlet.java.lang.reflect.Proxy.check13"/>
    <mauve class="gnu.testlet.java.text.AttributedString.constructors"/>
    <mauve class="gnu.testlet.java.security.KeyPairGenerator.getInstance14"/>
    <mauve class="gnu.testlet.java.security.AlgorithmParameterGenerator.getInstance14"/>
    <mauve class="gnu.testlet.java.lang.Class.serialization"/>
    <mauve class="gnu.testlet.java.lang.reflect.Proxy.check13"/>
    <mauve class="gnu.testlet.java.security.AlgorithmParameters.getInstance14"/>
    <mauve class="gnu.testlet.java.text.AttributedString.addAttributes"/>
    <mauve class="gnu.testlet.java.security.KeyFactory.getInstance14"/>
    <mauve class="gnu.testlet.java.security.SecureRandom.TestOfPR23899"/>
    <mauve class="gnu.testlet.java.util.UUID.TestAll"/>

I'll make the change in the tests so that we use a special inventory and exclude file for osx openj9 jdk8 - and continue to use the regular inventory and exclude file for the other os - jdk variations.

pshipton commented 4 years ago

These tests pass on Hotspot, we need to look into why they don't work on OpenJ9. https://github.com/AdoptOpenJDK/openjdk-systemtest/pull/316/files#r360007264

keithc-ca commented 4 years ago

Perhaps the test needs to use the -XstartOnFirstThread option; see [1].

[1] https://github.com/eclipse/openj9/blob/master/test/functional/DDR_Test/tck_ddrext.xml#L95

JasonFengJ9 commented 1 year ago

JDK 8 0.36 build(mac11x64rt04)

14:43:49  openjdk version "1.8.0_362"
14:43:49  IBM Semeru Runtime Open Edition (build 1.8.0_362-b07)
14:43:49  Eclipse OpenJ9 VM (build v0.36.0-release-c5da55ccc, JRE 1.8.0 Mac OS X amd64-64-Bit Compressed References 20230113_599 (JIT enabled, AOT enabled)
14:43:49  OpenJ9   - c5da55ccc
14:43:49  OMR      - 2bb04132e
14:43:49  JCL      - 1a94e3037e based on jdk8u362-b07)

[2023-01-13T20:21:17.214Z] 2023-01-13 15:21:17.137 java[38577:299094598] +[NSXPCSharedListener endpointForReply:withListenerName:]: an error occurred while attempting to obtain endpoint for listener 'ClientCallsAuxiliary': Connection interrupted
[2023-01-14T15:43:28.290Z] Cancelling nested steps due to timeout
JasonFengJ9 commented 7 months ago

JDK8 x86-64_mac(macx64rt4)

08:47:25  Running test JCL_FVTest_SE80_Internal_0 ...
08:47:25  ===============================================
08:47:25  JCL_FVTest_SE80_Internal_0 Start Time: Mon Mar 25 08:47:24 2024 Epoch Time (ms): 1711370844899
08:47:25  variation: NoOptions
08:47:25  JVM_OPTIONS: 

08:47:25  TESTING:
08:47:25  ...
08:47:25  ... TestNG 6.14.2 by Cédric Beust (cedric@beust.com)
08:47:25  ...
08:47:25  
08:47:26  
08:47:26  ** util.Date Test Harness **
08:47:26  
08:47:26  2024-03-25 08:47:26.630 java[68712:219993851] XType: Using static font registry.
08:47:27  2024-03-25 08:47:26.686 java[68712:219993851] +[NSXPCSharedListener endpointForReply:withListenerName:replyErrorCode:]: an error occurred while attempting to obtain endpoint for listener 'ClientCallsAuxiliary': Connection interrupted
09:51:52  Cannot contact macx64rt4.rtp.raleigh.ibm.com: java.lang.InterruptedException