eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 720 forks source link

Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le SharedClasses.SCM23.MultiThread crash #3006

Closed pshipton closed 6 years ago

pshipton commented 6 years ago

https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/26/

There is a core file in systemtest_test_output.tar.gz .\openjdk-tests\TestConfig\test_output_15378579504933\SharedClasses.SCM23.MultiThread_0\20180925-071823-SharedClasses\results\

===============================================
Running test SharedClasses.SCM23.MultiThread_0 ...
===============================================
SharedClasses.SCM23.MultiThread_0 Start Time: Tue Sep 25 07:18:22 2018 Epoch Time (ms): 1537859902937
test with NoOptions
STF 07:18:23.128 - =========================   S T F   =========================
systemtest-prereqs has been processed, and set to: /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/systemtest_prereqsRetrieving amount of free space on drive containing /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0
There is 60701 Mb free
STF 07:18:23.136 - ==================   G E N E R A T I O N   ==================
STF 07:18:23.137 - Checking JVM: /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/bin/../
STF 07:18:23.137 - Starting process to generate scripts: /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/bin/..//bin/java  -Dlog4j.skipJansi=true -Djava.system.class.loader=net.adoptopenjdk.stf.runner.StfClassLoader -classpath /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/systemtest/systemtest_prereqs/log4j-2.3/log4j-api-2.3.jar:/home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/systemtest/systemtest_prereqs/log4j-2.3/log4j-core-2.3.jar:/home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/stf/stf.core/scripts/../bin net.adoptopenjdk.stf.runner.StfRunner -properties "/home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/stf_parameters.properties, , /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/stf/stf.core/config/stf.properties" -testDir "/home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses"
GEN 07:18:24.424 - Found test. Project: 'openj9.test.sharedClasses' class: 'SharedClasses.class' Dir: '/home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/openj9-systemtest/openj9.test.sharedClasses/bin'
GEN 07:18:24.430 - Found test. Project: 'openj9.test.sharedClasses' class: 'net.openj9.stf.SharedClasses'
GEN Classpath directories used by project 'openj9.test.sharedClasses': 
GEN   /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/openj9-systemtest/openj9.test.sharedClasses/bin
GEN   /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/stf/stf.core/bin
GEN   /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/openj9-systemtest/openj9.stf.extensions/bin
GEN   /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/systemtest_prereqs/log4j-2.3/log4j-api-2.3.jar
GEN   /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/systemtest_prereqs/log4j-2.3/log4j-core-2.3.jar
GEN   /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/systemtest_prereqs/junit-4.12/junit-4.12.jar
GEN 07:18:24.724 - Using Mode NoOptions. Values = ''
GEN /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/systemtest_prereqs/sharedClassesTestData/v1 exists
GEN 07:18:24.807 - 
GEN 07:18:24.807 - Test command summary:
GEN 07:18:24.808 -   Step  Stage   Command           Description
GEN 07:18:24.809 -  -----+--------+-----------------+------------
GEN 07:18:24.809 -     1  setUp    cp                Copy sharedClasses jar
GEN 07:18:24.809 -     2  setUp    java              Destroy Persistent Shared Classes Caches
GEN 07:18:24.809 -     3  setUp    java              Destroy Non-Persistent Shared Classes Caches
GEN 07:18:24.809 -     4  execute  java              Reset Shared Classes Cache
GEN 07:18:24.809 -     5  execute  Run java*5        Start java processes using LoaderSlaveMultiThread
GEN 07:18:24.809 -     6  execute  java              Print Shared Classes Cache Stats
GEN 07:18:24.809 -     7  tearDown java              Destroy Persistent cache created by the test
STF 07:18:24.836 - 
STF 07:18:24.836 - Script generation completed
STF 07:18:24.836 - 
STF 07:18:24.836 - 
STF 07:18:24.836 - =======================   S E T U P   =======================
STF 07:18:24.837 - Running setup: perl /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/setUp.pl
STF 07:18:24.896 - 
STF 07:18:24.896 - +------ Step 1 - Copy sharedClasses jar
STF 07:18:24.896 - | Copy a file to another directory
STF 07:18:24.896 - |   Source file: /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/systemtest_prereqs/sharedClassesTestData/v1/classes.jar
STF 07:18:24.896 - |   Dest dir:    /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/tmp
STF 07:18:24.896 - |
STF 07:18:24.913 - 
STF 07:18:24.913 - +------ Step 2 - Destroy Persistent Shared Classes Caches
STF 07:18:24.913 - | Destroy all persistent caches
STF 07:18:24.913 - |
STF 07:18:24.914 - Running command: /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/bin/../bin/java -Xshareclasses:destroyAll
STF 07:18:24.914 - Redirecting stderr to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/2.SCC.stderr
STF 07:18:24.914 - Redirecting stdout to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/2.SCC.stdout
STF 07:18:24.923 - Monitoring processes: SCC
SCC stderr JVMSHRC005I No shared class caches available
STF 07:18:24.941 - Monitoring Report Summary:
STF 07:18:24.941 -   o Process SCC ended with the expected exit code (1)
STF 07:18:24.941 - 
STF 07:18:24.941 - +------ Step 3 - Destroy Non-Persistent Shared Classes Caches
STF 07:18:24.941 - | Destroy all non-persistent caches
STF 07:18:24.941 - |
STF 07:18:24.942 - Running command: /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/bin/../bin/java -Xshareclasses:destroyAll,nonpersistent
STF 07:18:24.942 - Redirecting stderr to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/3.SCC.stderr
STF 07:18:24.942 - Redirecting stdout to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/3.SCC.stdout
STF 07:18:24.943 - Monitoring processes: SCC
SCC stderr JVMSHRC005I No shared class caches available
STF 07:18:24.960 - Monitoring Report Summary:
STF 07:18:24.960 -   o Process SCC ended with the expected exit code (1)
STF 07:18:24.960 - SETUP stage completed
STF 07:18:24.963 - 
STF 07:18:24.963 - ====================   E X E C U T E -   ====================
STF 07:18:24.963 - Running execute: perl /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/execute.pl
STF 07:18:25.025 - 
STF 07:18:25.025 - Java version
STF 07:18:25.025 - Running: /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/bin/../bin/java -version
openjdk version "11-internal" 2018-09-25
OpenJDK Runtime Environment (build 11-internal+0-adhoc.jenkins.Build-JDK11-linuxppc-64cmprssptrsle)
Eclipse OpenJ9 VM (build master-c96d25f, JRE 11 Linux ppc64le-64-Bit Compressed References 20180925_28 (JIT enabled, AOT enabled)
OpenJ9   - c96d25f
OMR      - 71c0c91
JCL      - cd85901 based on jdk-11+28)
STF 07:18:25.178 - 
STF 07:18:25.178 - +------ Step 4 - Reset Shared Classes Cache
STF 07:18:25.178 - | Reset shared classes cache
STF 07:18:25.178 - |   Options:        -Xshareclasses:name=${cacheName},cacheDir=${cacheDir}${cacheOperation} -Xaot:forceAoT,count=1
STF 07:18:25.178 - |   CacheName:      sc_java6
STF 07:18:25.178 - |   CacheDir:       /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/caches
STF 07:18:25.178 - |   CacheOperation: ,reset
STF 07:18:25.178 - |
STF 07:18:25.179 - Running command: /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/bin/../bin/java -Xshareclasses:name=sc_java6,cacheDir=/home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/caches,reset -Xaot:forceAoT,count=1 -version -Xcompressedrefs
STF 07:18:25.179 - Redirecting stderr to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/4.SCC.stderr
STF 07:18:25.179 - Redirecting stdout to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/4.SCC.stdout
STF 07:18:25.189 - Monitoring processes: SCC
SCC stderr JVMSHRC023E Cache does not exist
SCC stderr openjdk version "11-internal" 2018-09-25
SCC stderr OpenJDK Runtime Environment (build 11-internal+0-adhoc.jenkins.Build-JDK11-linuxppc-64cmprssptrsle)
SCC stderr Eclipse OpenJ9 VM (build master-c96d25f, JRE 11 Linux ppc64le-64-Bit Compressed References 20180925_28 (JIT enabled, AOT enabled)
SCC stderr OpenJ9   - c96d25f
SCC stderr OMR      - 71c0c91
SCC stderr JCL      - cd85901 based on jdk-11+28)
STF 07:18:25.539 - Monitoring Report Summary:
STF 07:18:25.539 -   o Process SCC ended with the expected exit code (0)
STF 07:18:25.540 - 
STF 07:18:25.540 - +------ Step 5 - Start java processes using LoaderSlaveMultiThread
STF 07:18:25.540 - | Run multiple concurrent foreground processes
STF 07:18:25.540 - |   Program:     /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/bin/../bin/java
STF 07:18:25.540 - |   Mnemonic:    MT
STF 07:18:25.540 - |   Instances:   5
STF 07:18:25.540 - |   Echo:        ECHO_ON
STF 07:18:25.540 - |   Expectation: CLEAN_RUN within 1h
STF 07:18:25.540 - |
STF 07:18:25.540 - Running command: /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/bin/../bin/java -Xshareclasses:name=sc_java6,cacheDir=/home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/caches -Xaot:forceAoT,count=1 -Xcompressedrefs -classpath /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/openj9-systemtest/openj9.test.sharedClasses/bin net.openj9.test.sc.LoaderSlaveMultiThread /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/tmp/classes.jar 300
STF 07:18:25.540 - Redirecting stderr to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/5.MT1.stderr
STF 07:18:25.540 - Redirecting stdout to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/5.MT1.stdout
STF 07:18:25.542 - Running command: /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/bin/../bin/java -Xshareclasses:name=sc_java6,cacheDir=/home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/caches -Xaot:forceAoT,count=1 -Xcompressedrefs -classpath /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/openj9-systemtest/openj9.test.sharedClasses/bin net.openj9.test.sc.LoaderSlaveMultiThread /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/tmp/classes.jar 300
STF 07:18:25.542 - Redirecting stderr to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/5.MT2.stderr
STF 07:18:25.542 - Redirecting stdout to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/5.MT2.stdout
STF 07:18:25.544 - Running command: /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/bin/../bin/java -Xshareclasses:name=sc_java6,cacheDir=/home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/caches -Xaot:forceAoT,count=1 -Xcompressedrefs -classpath /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/openj9-systemtest/openj9.test.sharedClasses/bin net.openj9.test.sc.LoaderSlaveMultiThread /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/tmp/classes.jar 300
STF 07:18:25.544 - Redirecting stderr to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/5.MT3.stderr
STF 07:18:25.544 - Redirecting stdout to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/5.MT3.stdout
STF 07:18:25.545 - Running command: /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/bin/../bin/java -Xshareclasses:name=sc_java6,cacheDir=/home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/caches -Xaot:forceAoT,count=1 -Xcompressedrefs -classpath /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/openj9-systemtest/openj9.test.sharedClasses/bin net.openj9.test.sc.LoaderSlaveMultiThread /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/tmp/classes.jar 300
STF 07:18:25.545 - Redirecting stderr to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/5.MT4.stderr
STF 07:18:25.545 - Redirecting stdout to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/5.MT4.stdout
STF 07:18:25.547 - Running command: /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/bin/../bin/java -Xshareclasses:name=sc_java6,cacheDir=/home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/caches -Xaot:forceAoT,count=1 -Xcompressedrefs -classpath /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/jvmtest/systemtest/openj9-systemtest/openj9.test.sharedClasses/bin net.openj9.test.sc.LoaderSlaveMultiThread /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/tmp/classes.jar 300
STF 07:18:25.547 - Redirecting stderr to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/5.MT5.stderr
STF 07:18:25.547 - Redirecting stdout to /home/jenkins/jenkins-agent/workspace/Test-extended.system-JDK11-linux_ppc-64_cmprssptrs_le/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15378579504933/SharedClasses.SCM23.MultiThread_0/20180925-071823-SharedClasses/results/5.MT5.stdout
STF 07:18:25.549 - Monitoring processes: MT1 MT2 MT3 MT4 MT5
MT1 stderr Unhandled exception
MT1 stderr Type=Segmentation error vmState=0x00000000
MT1 stderr J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002
MT1 stderr Handler1=00003FFF81D8CAF0 Handler2=00003FFF81B1D2B0
MT1 stderr R0=00003FFF80CE290C R1=00003FFF362ABB30 R2=0000000080B8B668 R3=0000000080B0F210
MT1 stderr R4=0000000000000000 R5=000000000000006E R6=FFFFFFFFFFFFFFFF R7=00003FFF604D0038
MT1 stderr R8=00003FFF624EF2E8 R9=0000000000000000 R10=0000000000000000 R11=0000000000000000
MT1 stderr R12=0000000042004842 R13=00003FFF362B6900 R14=000000000063C2F0 R15=00000000002B9800
MT1 stderr R16=00003FFF604D0038 R17=FFFFFFFFFFFFFFFF R18=0000000000000000 R19=000000000000000A
MT1 stderr R20=0000000000000000 R21=0000000000014F00 R22=00003FFF36270000 R23=0000000000040000
MT1 stderr R24=00003FFF82804390 R25=000000000000006E R26=00000000FFF0A478 R27=0000000000000001
MT1 stderr R28=0000000000000000 R29=0000000000000001 R30=0000000080725530 R31=00000000FFF0A2A0
MT1 stderr NIP=00003FFF7C0CFDB4 MSR=800000011280F033 ORIG_GPR3=00003FFF80CE2968 CTR=00003FFF7C0CFDB4
MT1 stderr LINK=00003FFF624EF30C XER=0000000020000000 CCR=0000000082004842 SOFTE=0000000000000001
MT1 stderr TRAP=0000000000000400 DAR=00003FFF7C0CFDB4 dsisr=0000000010000000 RESULT=0000000000000000
MT1 stderr FPR0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR1 405202f8a3d7a807 (f: 2748819456.000000, d: 7.204643e+01)
MT1 stderr FPR2 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
MT1 stderr FPR3 bc40000000000000 (f: 0.000000, d: -1.734723e-18)
MT1 stderr FPR4 402a56ef8ec92280 (f: 2395546112.000000, d: 1.316980e+01)
MT1 stderr FPR5 3feeecc8c0000000 (f: 3221225472.000000, d: 9.664043e-01)
MT1 stderr FPR6 000000000000006e (f: 110.000000, d: 5.434722e-322)
MT1 stderr FPR7 0000000000000001 (f: 1.000000, d: 4.940656e-324)
MT1 stderr FPR8 76616a4c3b657479 (f: 996504704.000000, d: 1.713702e+262)
MT1 stderr FPR9 29433b676e697274 (f: 1852404352.000000, d: 6.397600e-110)
MT1 stderr FPR10 3ff08e64437d1788 (f: 1132271488.000000, d: 1.034764e+00)
MT1 stderr FPR11 bed21bc039a4a852 (f: 967092288.000000, d: -4.317379e-06)
MT1 stderr FPR12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR13 402a45706fc1af40 (f: 1874964224.000000, d: 1.313562e+01)
MT1 stderr FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MT1 stderr Target=2_90_20180925_28 (Linux 4.4.0-128-generic)
MT1 stderr CPU=ppc64le (4 logical CPUs) (0x1fe3a0000 RAM)
MT1 stderr ----------- Stack Backtrace -----------
MT1 stderr (0x00003FFF7C0CFDB4 [<unknown>+0x0])
MT1 stderr (0x00003FFF813B80C4 [libj9jit29.so+0x9680c4])
MT1 stderr (0x00003FFF81D70C34 [libj9vm29.so+0x90c34])
MT1 stderr (0x00003FFF81DE1DFC [libj9vm29.so+0x101dfc])
MT1 stderr (0x00003FFF81B1E89C [libj9prt29.so+0x2e89c])
MT1 stderr (0x00003FFF81DDD230 [libj9vm29.so+0xfd230])
MT1 stderr (0x00003FFF81CB0B98 [libj9thr29.so+0x10b98])
MT1 stderr (0x00003FFF827D8070 [libpthread.so.0+0x8070])
MT1 stderr clone+0x98 (0x00003FFF829A3A70 [libc.so.6+0x123a70])
irinarada commented 6 years ago

@zl-wang - Power platform failure. Wondering if it has anything to do with @jdmpapin 's or @dsouzai 's recent work. @vijaysun-omr

vijaysun-omr commented 6 years ago

New AOT changes have not been merged yet; the PRs are currently being reviewed. So I doubt if it's anything related to work being done by @dsouzai

irinarada commented 6 years ago

Thanks @vijaysun-omr . @zl-wang, @gita-omr is this a Power specific issue, if so, who should own it? See the last two comments.

zl-wang commented 6 years ago

I doubted it is p specific. @IBMJimmyk is investigating.

irinarada commented 6 years ago

FYI @IBMJimmyk is working on this. Thanks @gita-omr and @zl-wang . @IBMJimmyk - thanks for looking into this, as previously, frequent updates are very appreciated, this is blocking RC1.

IBMJimmyk commented 6 years ago

From the Javacore file: $r26 = 0xFFF0A478

0x3fff624ef2d0 {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +26 || 809a002c lwz r4,+44(r26) 0x3fff624ef2d4 {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +27 || 80ba0028 lwz r5,+40(r26) 0x3fff624ef2d8 {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +28 || 63430000 ori r3,r26,0x0 0x3fff624ef2dc {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +29 -1:15 ||| 480000bd bl 0x3fff624ef398 U>> +76 Snippet-> {java/lang/invoke/MethodHandle.undoCustomizationLogic} 0x3fff624ef2e0 {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +30 ||| 63430000 ori r3,r26,0x0 0x3fff624ef2e4 {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +31 -1:25 |||| 48000099 bl 0x3fff624ef37c U>> +69 Snippet-> {java/lang/invoke/MethodHandle.doCustomizationLogic} 0x3fff624ef2e8 {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +32 |||| 83fa0028 lwz r31,+40(r26) //$r31 = 0xFFF0A2A0 0x3fff624ef2ec {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +33 |||| 807a002c lwz r3,+44(r26) //$r3 = 0x80B0F210 0x3fff624ef2f0 {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +34 -1:40 |||| 08830000 tdi TO_EQ,r3,+0 0x3fff624ef2f4 {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +35 |||| 80430010 lwz r2,+16(r3) //$r2 = 0x80B8B668 0x3fff624ef2f8 {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +36 |||| 63840000 ori r4,r28,0x0 //$r4 = 0x0 0x3fff624ef2fc {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +37 |||| 63250000 ori r5,r25,0x0 //$r5 = 0x6e 0x3fff624ef300 {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +38 |||| e8020008 ld r0,+8(r2) //$r0 = 0x3FFF80CE290C 0x3fff624ef304 {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +39 |||| 7c0903a6 mtspr CTR,r0 0x3fff624ef308 {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +40 -1:50 |||| 4e800421 bctrl // Branch to 0x3FFF80CE290C

0x00003fff80ce290c: addi r14,r14,-48 0x00003fff80ce2910: std r3,0(r14) 0x00003fff80ce2914: std r15,8(r14) 0x00003fff80ce2918: std r5,40(r14) 0x00003fff80ce291c: std r4,32(r14) 0x00003fff80ce2920: std r3,24(r14) 0x00003fff80ce2924: mflr r4 0x00003fff80ce2928: std r4,16(r14) 0x00003fff80ce292c: ld r3,432(r15) 0x00003fff80ce2930: ld r4,-32248(r3) 0x00003fff80ce2934: ld r3,-32216(r3) 0x00003fff80ce2938: mtctr r4 0x00003fff80ce293c: mr r4,r14 0x00003fff80ce2940: mr r5,r14 0x00003fff80ce2944: bctrl 0x00003fff80ce2948: ld r4,0(r14) //$r4 = 0x00003fff7c0cfdb4 0x00003fff80ce294c: mtctr r4 //$CTR = 0x00003fff7c0cfdb4 0x00003fff80ce2950: ld r4,16(r14) 0x00003fff80ce2954: mtlr r4 0x00003fff80ce2958: ld r3,24(r14) 0x00003fff80ce295c: ld r4,32(r14) 0x00003fff80ce2960: ld r5,40(r14) 0x00003fff80ce2964: addi r14,r14,48 0x00003fff80ce2968: bctr //Branch to 0x00003fff7c0cfdb4 and crash since it doesn't contain a valid instruction

From looking at the core file and javacore file it looks like this is the path taken just before the crash.

zl-wang commented 6 years ago

invokeDynamic related, it seems. r3 was stored at 0(r14), later r4 was loaded from 0(r14). could you match the coming-in r3 to the value of 0x00003fff7c0cfdb4? Hopefully, the call at 0x00003fff80ce2944 didn't change the value at 0(r14).

IBMJimmyk commented 6 years ago

Unfortunately the call does change it.

//I can read from +44 from $r26 and see that $r3 = 0x80B0F210. 0x3fff624ef2ec {java/.../FoldNonvoidHandle.invokeExact_thunkArchetype_X} +33 |||| 807a002c lwz r3,+44(r26)

//$r3 should get stored to 0(r14) 0x00003fff80ce2910: std r3,0(r14)

//This call is in between the store and the load 0x00003fff80ce2944: bctrl

//There's a different value when you read it back out. $r4 = 0x00003fff7c0cfdb4 0x00003fff80ce2948: ld r4,0(r14)

The data at 0x80B0F210 isn't executable code either so I think overwriting the value is probably expected. The problem is the value is overwritten with a location that is still bad.

zl-wang commented 6 years ago

For the call at 0x00003fff80ce2944, that is a strange convention to say the least: you set up all the arguments in registers, but return value at stack-top. The called function is one from a table recorded in J9VMThread --- 0x00003fff80ce2930: ld r4,-32248(r3) which comes from 0x00003fff80ce292c: ld r3,432(r15).

That strange convention is suspicious to me. Let's see what that table is, and clarify that strange convention.

IBMJimmyk commented 6 years ago

The section of code starting at 0x00003fff80ce290c looks to be _initialInvokeExactThunkGlue. The bctrl at 0x00003fff80ce2944 looks to call out to jitCallCFunction.

0x00003fff80ce2934: ld r3,-32216(r3) //$r3 = 0x3fff80cb54f4 = start of initialInvokeExactThunk_unwrapper 0x00003fff80ce2938: mtctr r4 0x00003fff80ce293c: mr r4,r14 0x00003fff80ce2940: mr r5,r14 0x00003fff80ce2944: bctrl //call to jitCallCFunction

r4 is a pointer to parameters and r5 is a pointer to where jitCallCFunction will write back return values. It looks like a bad return value is coming back for some reason.

Current plan is to try and reproduce the problem with -Xjit:verbose={j2iThunks} option set to gather more information about what might have gone wrong.

IBMJimmyk commented 6 years ago

From the corefile this is where we are leading up to the crash. I've highlighted important locations where a register or memory value changes

$r3 starts at 0x80b0f210 and is an object of type java/lang/invoke/DirectHandle $r15 starts at 0x2b9800 and is the J9VMThread $r14 starts at 0x63c2f0 but immediately gets changes to 0x63c2c0

0x00003fff80ce290c <startproc._initialInvokeExactThunkGlue+0>: addi r14,r14,-48 //$r14 = 0x63c2c0 0x00003fff80ce2910 <startproc._initialInvokeExactThunkGlue+4>: std r3,0(r14) //0(r14) = 0x80b0f210 0x00003fff80ce2914 <startproc._initialInvokeExactThunkGlue+8>: std r15,8(r14) //8(r14) = $r15 = 0x2b9800 0x00003fff80ce2918 <startproc._initialInvokeExactThunkGlue+12>: std r5,40(r14) 0x00003fff80ce291c <startproc._initialInvokeExactThunkGlue+16>: std r4,32(r14) 0x00003fff80ce2920 <startproc._initialInvokeExactThunkGlue+20>: std r3,24(r14) 0x00003fff80ce2924 <startproc._initialInvokeExactThunkGlue+24>: mflr r4 0x00003fff80ce2928 <startproc._initialInvokeExactThunkGlue+28>: std r4,16(r14) 0x00003fff80ce292c <startproc._initialInvokeExactThunkGlue+32>: ld r3,432(r15) //$r3 = 0x3fff815e1300 0x00003fff80ce2930 <startproc._initialInvokeExactThunkGlue+36>: ld r4,-32248(r3) //$r4 = 0x3fff813b7fe8 = start of jitCallCFunction 0x00003fff80ce2934 <startproc._initialInvokeExactThunkGlue+40>: ld r3,-32216(r3) //$r3 = 0x3fff80cb54f4 = start of initialInvokeExactThunk_unwrapper 0x00003fff80ce2938 <startproc._initialInvokeExactThunkGlue+44>: mtctr r4 //$ctr = 0x3fff813b7fe8 0x00003fff80ce293c <startproc._initialInvokeExactThunkGlue+48>: mr r4,r14 //$r4 = 0x63c2c0, address of input parameter to initialInvokeExactThunk_unwrapper 0x00003fff80ce2940 <startproc._initialInvokeExactThunkGlue+52>: mr r5,r14 //$r5 = 0x63c2c0, address of where to store return value from initialInvokeExactThunk_unwrapper 0x00003fff80ce2944 <startproc._initialInvokeExactThunkGlue+56>: bctrl //Jump to jitCallCFunction (0x3fff813b7fe8) 0x00003fff80ce2948 <startproc._initialInvokeExactThunkGlue+60>: ld r4,0(r14) //Read from 0x63c2c0. $r4 = 0x3fff7c0cfdb4, this is garbage and bad 0x00003fff80ce294c <startproc._initialInvokeExactThunkGlue+64>: mtctr r4 //$ctr = 0x3fff7c0cfdb4 0x00003fff80ce2950 <startproc._initialInvokeExactThunkGlue+68>: ld r4,16(r14) 0x00003fff80ce2954 <startproc._initialInvokeExactThunkGlue+72>: mtlr r4 0x00003fff80ce2958 <startproc._initialInvokeExactThunkGlue+76>: ld r3,24(r14) 0x00003fff80ce295c <startproc._initialInvokeExactThunkGlue+80>: ld r4,32(r14) 0x00003fff80ce2960 <startproc._initialInvokeExactThunkGlue+84>: ld r5,40(r14) 0x00003fff80ce2964 <startproc._initialInvokeExactThunkGlue+88>: addi r14,r14,48 0x00003fff80ce2968 <startproc._initialInvokeExactThunkGlue+92>: bctr //Branch to 0x3fff7c0cfdb4 and crash immediately

The jump to jitCallCFunction at 0x00003fff80ce2944 eventually reaches old_slow_jitCallCFunction inside runtime/codert_vm/cnathelp.cpp.

old_slow_jitCallCFunction calls the function initialInvokeExactThunk_unwrapper and passes in the addresses to its input parameters and where to store its results. In this case both addresses are the same: 0x63c2c0. When the ld at 0x00003fff80ce2948 eventually reads the result from 0x63c2c0, it gets a bad value.

initialInvokeExactThunk_unwrapper is inside runtime/compiler/runtime/JitRuntime.cpp. At the start it attempts to read its two input values using the passed in address. From the corefile, it looks like it should have read 0x2b9800 as the J9VMThread and 0x80b0f210 as the methodHandle. 0x2b9800 looks like it really was the J9VMThread and 0x80b0f210 was a Java object of type java/lang/invoke/DirectHandle. initialInvokeExactThunk_unwrapper then makes a call to initialInvokeExactThunk and stores the result to resPtr. This store looks like it was the last thing to touch the address 0x63c2c0 and seems to be responsible for the bad value that is read back out.

The -Xjit:verbose={j2iThunks} option was expected to give more insight into what was happening inside initialInvokeExactThunk but I have not yet been able to reproduce the problem with this -Xjit option active.

pshipton commented 6 years ago

@DanHeidinga - @zl-wang mentioned to me - this "looks like a problem in VM code. JIT-ed code (Thunk_archetype for JSR292) calls jitCallCFunction with valid arguments (J9VMThread, and MethodHandle object). That C function is VM code. It returns bad address on stack top."

DanHeidinga commented 6 years ago

@gacholio can you take a look?

gacholio commented 6 years ago

I don't understand the comment from @zl-wang - there's nothing wrong with jitCallCFunction (it's so trivial, there pretty much couldn't be):

typedef void (*twoVoidFunc)(void*, void*);

void J9FASTCALL
old_slow_jitCallCFunction(J9VMThread *currentThread)
{
    OLD_SLOW_ONLY_JIT_HELPER_PROLOGUE(3);
    DECLARE_JIT_PARM(twoVoidFunc, functionPointer, 1);
    DECLARE_JIT_PARM(void*, argumentPointer, 2);
    DECLARE_JIT_PARM(void*, returnValuePointer, 3);
    functionPointer(argumentPointer, returnValuePointer);
    SLOW_JIT_HELPER_EPILOGUE();
}

So if there's some kind of mismatch in arguments, it's a mismatch between the generated code which calls the helper and the argument array use in the target function.

IBMJimmyk commented 6 years ago

I think the problem is with what old_slow_jitCallCFunction calls. old_slow_jitCallCFunction calls out to initialInvokeExactThunk_unwrapper.

void initialInvokeExactThunk_unwrapper(void argsPtr, void resPtr) { J9VMThread vmThread = (J9VMThread)argsPtr[1]; j9object_t methodHandle = (j9object_t)argsPtr[0]; *resPtr = initialInvokeExactThunk(methodHandle, vmThread); }

What seems to be happening is initialInvokeExactThunk returns a bad value that is written to the resPtr location which goes on the cause a problem when it is read later on. Right now it seems like good values for methodHandle and vmThread are being passed into initialInvokeExactThunk.

zl-wang commented 6 years ago

right, I referred the C function to initialInvokeExactThunk, rather than jitCallCFunction helper. i.e. what is eventually called by jitCallCFunction.

DanHeidinga commented 6 years ago

initialInvokeExactThunk is JIT code: https://github.com/eclipse/openj9/blob/450399bbd37af823f4360f033ac0e48dbdcd1d6c/runtime/compiler/env/VMJ9.cpp#L9101-L9111

@zl-wang Is this the code you think is incorrect?

pshipton commented 6 years ago

For the record, as I understand it this failure occurs 1/20 1/30

gita-omr commented 6 years ago

And it fails in FoldHandle. Is it something that is rarely used e.g. not created by regular Java lambdas?

pshipton commented 6 years ago

Hard to say, it depends on the code being run, and if it uses MethodHandles. Nashorn uses it.

DanHeidinga commented 6 years ago

@gita-omr I don't think there's anything special about FoldHandle. We could hit this same issue in any of the MH subclasses

gita-omr commented 6 years ago

Jimmy will add an update soon. But yes, looks like a general (albeit very intermittent) MH problem.

IBMJimmyk commented 6 years ago

I ran the test 120 more times since yesterday. 60 runs under the original options and 60 runs with -Xjit:verbose={j2iThunks}. With the original options all 60 runs passed. With the verbose option, 59 out of 60 passed. A grinder yesterday showed 1 failure in 30 tries. There were previous failures even before yesterday but I'm not sure how many runs were done to produce them. Based on these results, I think the failure rate is actually less than 1/30.

I'm currently looking at the new logs I got and hopefully it will show something useful.

gita-omr commented 6 years ago

A quick update on the nature of the problem: Last night @IBMJimmyk and @jdmpapin did a very thorough debugging and were able to find the content of the i2jThunk table in the core file. This is the table initialInvokeExactThunk is searching through. It looks like the table contains a valid i2jThunk for the signature we are looking for. However, the latest failing verbose log shows that an incorrect value was returned. This can be due to racing conditions (although reads and writes to the table seem to be properly synchronized). @IBMJimmyk is currently trying to confirm the whole theory based on the new verbose log and core file.

IBMJimmyk commented 6 years ago

From the end of findThunkFromTerseSignature inside runtime/compiler/env/J2IThunk.cpp:

else { OMR::CriticalSection critialSection(_monitor); match = root()->get(terseSignature, _nodes, false); }

return match? match->_thunk : NULL;

After the code exits the critical section the Node object pointed to by match can move. If that happens before we read match->_thunk and something else is written to that memory location, we can end up with a bad result.

This is a problem. With any luck it will turn out to be the problem. I will be running tests to check.

IBMJimmyk commented 6 years ago

I created a special build where I added a 10us delay just after the critical section but before the code reads from match->_thunk. I got 5/6 failures and they failed in a similar way. Furthermore I added debugging code and checked that reading from match->_thunk just before leaving the critical section never gave the bad result. I am now fairly certain I have found the problem.

irinarada commented 6 years ago

Nice @IBMJimmyk ! This should also help speed up the proof that the test passes (no need to wait 60 times for a reproduction). Doing the same experiment above with the fix might do it.

IBMJimmyk commented 6 years ago

It seems like people are already happy with the fix but for the record I tried running this test 100 times overnight with my fix and they all passed.

gita-omr commented 6 years ago

The fix was such a right thing to do that I wanted to get it in ASAP :) Thanks a lot!