eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 720 forks source link

testJITServerArguments_0 failure on ppc64le: crash during initial SSL connection #19325

Closed cjjdespres closed 3 months ago

cjjdespres commented 4 months ago

First reported here. The server appears to crash during initial SSL connection in the subtests "Test SSL condition" and "Test SSL Failure Case with mismatched certificate". Console log:

        testJitserverArguments_0 Start Time: Thu Apr 11 01:10:28 2024 Epoch Time (ms): 1712823028662
        variation: NoOptions
        JVM_OPTIONS: -XX:+UseJITServer 

        Testing: Test SSL success condition
        Test start time: 2024/04/11 01:11:29 Pacific Standard Time
        Running command: bash /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jitserver/jitserverScript.sh /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jitserver /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_1/jdkbinary/j2sdk-image/bin "-Xjit" "-XX:+UseJITServer -Xjit:count=0,verbose={JITServer},verbose={JITServerConns},verbose={compilePerformance} -XX:-JITServerLocalSyncCompiles -XX:JITServerSSLRootCerts=cert.pem" false false
        Time spent starting: 4 milliseconds
        Time spent executing: 10803 milliseconds
        Test result: FAILED
        Output from test:
         [OUT] start running script
         [OUT] Generate SSL certificates
         [OUT] Creating SSL certificates
         [OUT] Certificates generated
         [OUT] Starting /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_1/jdkbinary/j2sdk-image/bin/jitserver -XX:JITServerPort=44729 -XX:JITServerHealthProbePort=38966  -Xjit -XX:JITServerSSLKey=key.pem -XX:JITServerSSLCert=cert.pem
         [OUT]  785942 ?        00:00:00 jitserver
         [OUT] JITSERVER EXISTS
         [OUT] JITSERVER NO LONGER EXISTS
         [OUT] Terminating /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_1/jdkbinary/j2sdk-image/bin/jitserver -XX:JITServerPort=44729 -XX:JITServerHealthProbePort=38966  -Xjit -XX:JITServerSSLKey=key.pem -XX:JITServerSSLCert=cert.pem
         [OUT] finished script
         [ERR] ....+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*.....+.....+.+.....+.............+...+...+..+.............+......+...+..+...+.......+......+..+...+.......+...+...+..............+...+...+.+........+...+.......+..+......+............+...+....+...+..+...+.+......+......+...+.........+..+....+...........+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+.........+....+..+.............+.....+...+...+.+...+........+....+...+...+...+.....+.......+.....+...+.........+......+............+.........+...+.+..+............+...+............+..........+..+.+..+.+......+.....+...+......+...............+.+...........+...+................+.....+...+.+..+.......+...........+....+.....+.......+..................+...+.....+....+...+........+............+.+......+...............+...+...+..+...+...................+.....+.+........+.+......+........+..........+.........+..+.+...+...........+....+..............+...+...+.+.....+....+......+.....+......+...+.+.....+.........+.+...+...+...+...........+.+...+......+.....+....+.................+.......+...+..+............+.+........+.+...........+....+...+......+.....+............+.......+..+...+......+.......+..+.............+..+............+...+...+.......+...+.....+..........+...+..............+.+......+..............+.+......+..+.+..+.............+.....+.........+.+..+.........+......+.......+.....................+..+....+...+.....+.+.........+...+.....+...+......+.........................+.....+.+.........+...........+.......+..+...+.+......+.........+.....+....+............+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
         [ERR] .......+...+.+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*..+...+..+............+.+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+......+.+...+.....................+...+....................+.......+...........+...+.+.....+.+............+......+..+.......+......+..+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
         [ERR] -----
         [ERR] 
         [ERR] JITServer is ready to accept incoming requests
         [ERR] #PERF:  WARNING: Disclaim feature disabled because either uname() failed or kernel version is not 5.4 or later
         [ERR] #JITServer: JITServer version: 1.60.0
         [ERR] #JITServer: JITServer Client Mode. Server address: localhost port: 44729. Connection Timeout 30000ms
         [ERR] #JITServer: Identifier for current client JVM: 15900127758603301496
         [ERR] #JITServer: Built against (OpenSSL 3.0.14-dev ); Loaded with (OpenSSL 3.0.7 1 Nov 2022)
         [ERR] #JITServer: Successfully initialized SSL context (OpenSSL 3.0.7 1 Nov 2022)
         [ERR] #INFO:  _______________________________________
         [ERR] #INFO:  Version Information:
         [ERR] #INFO:       JIT Level  - j9jit_20240411_0307_jenkins
         [ERR] #INFO:       JVM Level  - 20240411_1707
         [ERR] #INFO:       GC Level   - b2db2045aa4
         [ERR] #INFO:  
         [ERR] #INFO:  Processor Information:
         [ERR] #INFO:       Platform Info:PPCp8
         [ERR] #INFO:       Supports HardwareSQRT:1
         [ERR] #INFO:       Supports HardwareRound:1
         [ERR] #INFO:       Supports HardwareCopySign:1
         [ERR] #INFO:       Supports FPU:1
         [ERR] #INFO:       Supports DFP:1
         [ERR] #INFO:       Supports VMX:1
         [ERR] #INFO:       Supports VSX:1
         [ERR] #INFO:       Supports AES:1
         [ERR] #INFO:       Supports  TM:0
         [ERR] #INFO:       Vendor:Unknown
         [ERR] #INFO:       numProc=4
         [ERR] #INFO:  
         [ERR] #INFO:  _______________________________________
         [ERR] #INFO:  AOT 
         [ERR] #INFO:  options specified:
         [ERR] #INFO:       samplingFrequency=2
         [ERR] #INFO:  
         [ERR] #INFO:  options in effect:
         [ERR] #INFO:       count=1000
         [ERR] #INFO:       disableSuffixLogs
         [ERR] #INFO:       samplingFrequency=2
         [ERR] #INFO:       verbose={options|compilePerformance|JITServer|JITServerConns}
         [ERR] #INFO:       compressedRefs shiftAmount=0
         [ERR] #INFO:  aggressivenessLevel=5
         [ERR] #INFO:  _______________________________________
         [ERR] #INFO:  JIT 
         [ERR] #INFO:  options specified:
         [ERR] #INFO:       count=0,verbose={JITServer},verbose={JITServerConns},verbose={compilePerformance}
         [ERR] ,disableSuffixLogs#INFO:  
         [ERR] #INFO:  options in effect:
         [ERR] #INFO:       count=0
         [ERR] #INFO:       disableSuffixLogs
         [ERR] #INFO:       samplingFrequency=2
         [ERR] #INFO:       verbose={options|compilePerformance|JITServer|JITServerConns}
         [ERR] #INFO:       compressedRefs shiftAmount=0
         [ERR] #INFO:  aggressivenessLevel=5
         [ERR] #INFO:  Allocated new data cache segment starting at address 00007FFF705B0030
         [ERR] #INFO:  StartTime: Apr 11 01:11:34 2024
         [ERR] #INFO:  Free Physical Memory: 4801 MB 
         [ERR] #INFO:  Shared Class Cache Information:
         [ERR] #INFO:   SCCname:sharedcc_jenkins   SCCpath:/home/jenkins/.cache/javasharedresources/C290M11F1A64P_sharedcc_jenkins_G43L00
         [ERR] #INFO:   SCC_stats_bytes: size=314572192 free=60487874 ROMClass=1714560 AOTCode=3979608 AOTData=288 JITHint=1408 JITProfile=0
         [ERR] #INFO:   SCC_stats_#:     ROMClasses=509 AOTMethods=1363 AOTDataEntries=1 AOTHints=49 AOTJitProfiles=0
         [ERR] #INFO:   AOT header compressedRefs shiftAmount=0
         [ERR] #INFO:  CPU entitlement = 400.00
         [ERR] #INFO:  Method java/lang/String.<clinit>()V will continue as interpreted
         [ERR]  (warm) Compiling java/lang/Object.<init>()V  OrdinaryMethod j9m=00000000000AB3C8 remote t=0 compThreadID=0 memLimit=262144 KB freePhysicalMemory=4801 MB
         [ERR] JVMDUMP039I Processing dump event "abort", detail "" at 2024/04/11 01:11:34 - please wait.
         [ERR] JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_1/aqa-tests/TKG/output_17128217279712/testJitserverArguments_0/core.20240411.011134.785942.0001.dmp' in response to an event
         [ERR] JVMDUMP010I System dump written to /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_1/aqa-tests/TKG/output_17128217279712/testJitserverArguments_0/core.20240411.011134.785942.0001.dmp
         [ERR] JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_1/aqa-tests/TKG/output_17128217279712/testJitserverArguments_0/javacore.20240411.011134.785942.0002.txt' in response to an event
         [ERR] #JITSTATE:  t=  1516 JIT changed state from STARTUP   to IDLE      cSmpl=  0 iSmpl=  0 comp=  0 recomp=  0, Q_SZ=  0 CLP=OFF jvmCPU=0%
         [ERR] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_1/aqa-tests/TKG/output_17128217279712/testJitserverArguments_0/javacore.20240411.011134.785942.0002.txt
         [ERR] JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_1/aqa-tests/TKG/output_17128217279712/testJitserverArguments_0/Snap.20240411.011134.785942.0003.trc' in response to an event
         [ERR] JVMDUMP010I Snap dump written to /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_1/aqa-tests/TKG/output_17128217279712/testJitserverArguments_0/Snap.20240411.011134.785942.0003.trc
         [ERR] JVMDUMP032I JVM requested JIT dump using '/home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_1/aqa-tests/TKG/output_17128217279712/testJitserverArguments_0/jitdump.20240411.011134.785942.0004.dmp' in response to an event
         [ERR] JVMDUMP051I JIT dump occurred in '(unnamed thread)' thread 0x00000000001C6F00
         [ERR] JVMDUMP010I JIT dump written to /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_1/aqa-tests/TKG/output_17128217279712/testJitserverArguments_0/jitdump.20240411.011134.785942.0004.dmp
         [ERR] JVMDUMP013I Processed dump event "abort", detail "".
         [ERR] 70F1A990FF7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:321:
         [ERR] #FAILURE:  JITServer::StreamFailure: Failed to SSL_connect for java/lang/Object.<init>()V @ warm
         [ERR] #JITServer: t=  1924 Could not connect to a server. Next attempt in 2000 ms.
         [ERR] #JITServer: t=  1924 Resetting activation policy to AGGRESSIVE because client has lost connection to server
         [ERR] #JITServer: compThreadID=0 JITServer StreamFailure: Generic stream failure
         [ERR] ! (warm) java/lang/Object.<init>()V Q_SZ=0 Q_SZI=0 QW=2 j9m=00000000000AB3C8 time=1922166us compilationStreamFailure memLimit=262144 KB freePhysicalMemory=4832 MB mem=[region=64 system=2048]KB compThreadID=0
         [ERR]  (AOT cold) Compiling java/lang/Object.<init>()V  OrdinaryMethod j9m=00000000000AB3C8  t=1924 compThreadID=0 memLimit=262144 KB freePhysicalMemory=4832 MB
         [ERR] #INFO:  Allocated new data cache segment starting at address 00007FFF391F0030
         [ERR] + (AOT cold) java/lang/Object.<init>()V @ 00007FFF71840088-00007FFF718400F4 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=2 j9m=00000000000AB3C8 bcsz=1 sync time=3934us mem=[region=960 system=2048]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=59% queueTime=1927204us
         [ERR] #PERF:  Reencoding count=1000 for java/lang/Object.<init>()V j9m=00000000000AB3C8 
         [ERR] #INFO:  Method com/ibm/jit/JITHelpers.<clinit>()V will continue as interpreted
         [ERR]  (AOT cold) Compiling com/ibm/jit/JITHelpers.classIsInterfaceFlag()I  OrdinaryMethod j9m=00000000000B9660  t=1924 compThreadID=0 memLimit=262144 KB freePhysicalMemory=4832 MB
         [ERR] + (AOT cold) com/ibm/jit/JITHelpers.classIsInterfaceFlag()I @ 00007FFF71840144-00007FFF71840178 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=2 j9m=00000000000B9660 bcsz=4 sync time=666us mem=[region=768 system=2048]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=59% queueTime=695us

The rest of the log seems to be the remainder of the (otherwise successful) client run. I think the SSL error message unexpected eof while reading:ssl/record/rec_layer_s3.c:321 in the logs was emitted by the client after the SSL connection failed.

Seen in, e.g., the internal tests:

I ran a 50x grinder here and got one iteration with 10/10 failed tests. The other 4 iterations passed.

cjjdespres commented 4 months ago

Attn @mpirvu.

JasonFengJ9 commented 4 months ago

JDK8 ppc64le_linux(cent8le-rtp-rt5-1)

openjdk version "1.8.0_412"
IBM Semeru Runtime Open Edition (build 1.8.0_412-b07)
Eclipse OpenJ9 VM (build v0.44.0-release-8b079b5c1, JRE 1.8.0 Linux ppc64le-64-Bit Compressed References 20240415_892 (JIT enabled, AOT enabled)
OpenJ9   - 8b079b5c1
OMR      - 254af5a04
JCL      - 5b71de783c based on jdk8u412-b07)

[2024-04-15T15:11:16.191Z] variation: NoOptions
[2024-04-15T15:11:16.191Z] JVM_OPTIONS:  

[2024-04-15T15:12:01.126Z] Testing: Test SSL success condition
[2024-04-15T15:12:01.126Z] Test start time: 2024/04/15 08:12:00 Pacific Standard Time
[2024-04-15T15:12:01.126Z] Running command: bash /home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jitserver/jitserverScript.sh /home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jitserver /home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_ppc64le_linux_testList_1/jdkbinary/j2sdk-image/bin "-Xjit" "-XX:+UseJITServer -Xjit:count=0,verbose={JITServer},verbose={JITServerConns},verbose={compilePerformance} -XX:-JITServerLocalSyncCompiles -XX:JITServerSSLRootCerts=cert.pem" false false
[2024-04-15T15:12:01.126Z] Time spent starting: 3 milliseconds
[2024-04-15T15:12:08.095Z] Time spent executing: 6913 milliseconds
[2024-04-15T15:12:08.095Z] Test result: FAILED

[2024-04-15T15:12:08.095Z]  [ERR] #INFO:  CPU entitlement = 400.00
[2024-04-15T15:12:08.095Z]  [ERR] #INFO:  Method java/lang/String.<clinit>()V will continue as interpreted
[2024-04-15T15:12:08.095Z]  [ERR]  (warm) Compiling java/lang/String$CaseInsensitiveComparator.<init>(Ljava/lang/String$1;)V  OrdinaryMethod j9m=00000000000B89F0 remote t=0 compThreadID=0 memLimit=262144 KB freePhysicalMemory=6718 MB
[2024-04-15T15:12:08.095Z]  [ERR] JVMDUMP039I Processing dump event "abort", detail "" at 2024/04/15 08:12:03 - please wait.
[2024-04-15T15:12:08.095Z]  [ERR] JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17131929433666/testJitserverArguments_0/core.20240415.081203.2798235.0001.dmp' in response to an event
[2024-04-15T15:12:08.095Z]  [ERR] JVMDUMP010I System dump written to /home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17131929433666/testJitserverArguments_0/core.20240415.081203.2798235.0001.dmp
[2024-04-15T15:12:08.095Z]  [ERR] JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17131929433666/testJitserverArguments_0/javacore.20240415.081203.2798235.0002.txt' in response to an event
[2024-04-15T15:12:08.095Z]  [ERR] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17131929433666/testJitserverArguments_0/javacore.20240415.081203.2798235.0002.txt
[2024-04-15T15:12:08.095Z]  [ERR] JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17131929433666/testJitserverArguments_0/Snap.20240415.081203.2798235.0003.trc' in response to an event
[2024-04-15T15:12:08.095Z]  [ERR] JVMDUMP010I Snap dump written to /home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17131929433666/testJitserverArguments_0/Snap.20240415.081203.2798235.0003.trc
[2024-04-15T15:12:08.095Z]  [ERR] JVMDUMP032I JVM requested JIT dump using '/home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17131929433666/testJitserverArguments_0/jitdump.20240415.081203.2798235.0004.dmp' in response to an event
[2024-04-15T15:12:08.095Z]  [ERR] JVMDUMP051I JIT dump occurred in '(unnamed thread)' thread 0x0000000000182D00
[2024-04-15T15:12:08.095Z]  [ERR] JVMDUMP010I JIT dump written to /home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17131929433666/testJitserverArguments_0/jitdump.20240415.081203.2798235.0004.dmp
[2024-04-15T15:12:08.095Z]  [ERR] JVMDUMP013I Processed dump event "abort", detail "".
[2024-04-15T15:12:08.095Z]  [ERR] 70F1119EFF7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:321:
[2024-04-15T15:12:08.095Z]  [ERR] #FAILURE:  JITServer::StreamFailure: Failed to SSL_connect for java/lang/String$CaseInsensitiveComparator.<init>(Ljava/lang/String$1;)V @ warm
[2024-04-15T15:12:08.095Z]  [ERR] #JITServer: t=  1013 Could not connect to a server. Next attempt in 2000 ms.
[2024-04-15T15:12:08.095Z]  [ERR] #JITServer: t=  1013 Resetting activation policy to AGGRESSIVE because client has lost connection to server
[2024-04-15T15:12:08.095Z]  [ERR] #JITServer: compThreadID=0 JITServer StreamFailure: Generic stream failure

[2024-04-15T15:13:26.686Z] ---TEST RESULTS---
[2024-04-15T15:13:26.686Z] Number of PASSED tests: 7 out of 9
[2024-04-15T15:13:26.686Z] Number of FAILED tests: 2 out of 9
[2024-04-15T15:13:26.686Z] 
[2024-04-15T15:13:26.686Z] ---SUMMARY OF FAILED TESTS---
[2024-04-15T15:13:26.686Z] Test SSL success condition
[2024-04-15T15:13:26.686Z] Test SSL Failure Case with mismatched certificate
[2024-04-15T15:13:26.686Z] -----------------------------
[2024-04-15T15:13:26.686Z] 
[2024-04-15T15:13:26.686Z] -----------------------------------
[2024-04-15T15:13:26.686Z] testJitserverArguments_0_FAILED

50x grinder - https://github.com/eclipse-openj9/openj9/issues/19325#issuecomment-2060261150

pshipton commented 4 months ago

The following tests failed 10/10 on cent8le-svl-rt5-1 and rhel8le-rtp-rt1-1. Passed on sles12le-rtp-rt7-1, ubu20le-rtp-rt12-1, ubu20le-svl-rt1-1. Test SSL success condition Test SSL Failure Case with mismatched certificate

mpirvu commented 4 months ago

@cjjdespres is looking into this failure.

cjjdespres commented 4 months ago

I haven't been able to reproduce this locally yet, unfortunately, but from what I've seen it only seems to fail on rhel8 and cent8. Could this be related to https://github.ibm.com/runtimes/infrastructure/issues/9147#issuecomment-76212158? It seems as though the server is crashing in this call:

https://github.com/eclipse-openj9/openj9/blob/b1c35645683431ba31a939ad68c1d4b5e2c8882f/runtime/compiler/runtime/Listener.cpp#L106

judging from the logs, anyway - the client fails to complete the SSL handshake here:

https://github.com/eclipse-openj9/openj9/blob/b1c35645683431ba31a939ad68c1d4b5e2c8882f/runtime/compiler/net/ClientStream.cpp#L259

probably because of the server crash.

cjjdespres commented 4 months ago

Building openssl 3.0.7 from source did not appear to work. I installed openssl 3.0.7 from EPEL on a RHEL 8.8 fyre VM and managed to reproduce the crash. It's crashing with an illegal instruction exception. Backtrace from gdb:

Thread 71 "JITServer Liste" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7fffd725f170 (LWP 44008)]
ppc_aes_gcm_encrypt () at crypto/modes/aes-gcm-ppc.s:419
419     lxvb16x 32, 0, 8
(gdb) bt
#0  ppc_aes_gcm_encrypt () at crypto/modes/aes-gcm-ppc.s:419
#1  0x00007fffd7e3c530 in aes_p10_gcm_crypt (in=0x7fff6c0695b8 "\v", 
    out=0x7fff6c0695b8 "\v", len=784, key=0x7fff6c0735c0, 
    ivec=0x7fff6c0733f8 ">\263\354\307C\276cmʫ\354\035", Xi=0x7fff6c073438, 
    encrypt=<optimized out>)
    at providers/implementations/ciphers/cipher_aes_gcm_hw_ppc.inc:76
#2  0x00007fffd7e3ca50 in generic_aes_gcm_cipher_update (ctx=0x7fff6c073300, 
    in=0x7fff6c0695b8 "\v", len=795, out=0x7fff6c0695b8 "\v")
    at providers/implementations/ciphers/cipher_aes_gcm_hw.c:76
#3  0x00007fffd7e98f94 in gcm_cipher_internal (ctx=0x7fff6c073300, 
    out=0x7fff6c0695b8 "\v", padlen=0x7fffd725a910, in=0x7fff6c0695b8 "\v", 
    len=795) at providers/implementations/ciphers/ciphercommon_gcm.c:426
#4  0x00007fffd7e9a064 in ossl_gcm_stream_update (vctx=<optimized out>, 
    out=<optimized out>, outl=<optimized out>, outsize=<optimized out>, 
    in=<optimized out>, inl=<optimized out>)
    at providers/implementations/ciphers/ciphercommon_gcm.c:319
#5  0x00007fffd7d0db0c in EVP_EncryptUpdate (ctx=<optimized out>, 
    out=<optimized out>, outl=0x7fffd725a9b8, in=<optimized out>, 
    inl=<optimized out>) at crypto/evp/evp_enc.c:643
#6  0x00007fffd7d0e754 in EVP_CipherUpdate (ctx=<optimized out>, 
    out=<optimized out>, outl=<optimized out>, in=<optimized out>, 
    inl=<optimized out>) at crypto/evp/evp_enc.c:419
#7  0x00007fff5f3ce74c in tls13_enc (s=0x7fff6c05dea0, recs=0x7fffd725b228, 
    cs=<optimized out>, sending=<optimized out>, mac=<optimized out>, macsize=<optimized out>) at ssl/record/ssl3_record_tls13.c:171
#8  0x00007fff5f3c80b8 in do_ssl3_write (s=0x7fff6c05dea0, type=<optimized out>, buf=0x7fff6c05ff00 "\v", pipelens=<optimized out>, numpipes=1, create_empty_fragment=<optimized out>, written=0x7fffd725bd20) at ssl/record/rec_layer_s3.c:1056
#9  0x00007fff5f3c8628 in ssl3_write_bytes (s=0x7fff6c05dea0, type=<optimized out>, buf_=0x7fff6c05ff00, len=<optimized out>, written=0x7fffd725beb0) at ssl/record/rec_layer_s3.c:647
#10 0x00007fff5f3ed66c in ssl3_do_write (s=<optimized out>, type=<optimized out>) at ssl/statem/statem_lib.c:48
#11 0x00007fff5f38a69c in ssl3_handshake_write (s=<optimized out>) at ssl/s3_lib.c:3327
#12 0x00007fff5f3e03f4 in statem_do_write (s=0x7fff6c05dea0) at ssl/statem/statem.c:719
#13 write_state_machine (s=0x7fff6c05dea0) at ssl/statem/statem.c:884
#14 state_machine (s=0x7fff6c05dea0, server=<optimized out>) at ssl/statem/statem.c:451
#15 0x00007fff5f3a1bc8 in SSL_do_handshake (s=0x7fff6c05dea0) at ssl/ssl_lib.c:3937
#16 0x00007fff5f3a3ef0 in SSL_accept (s=<optimized out>) at ssl/ssl_lib.c:1749
#17 0x00007ffff6401968 in TR_Listener::serveRemoteCompilationRequests(BaseCompileDispatcher*) () from /root/failing-jdk/lib/default/libj9jit29.so
#18 0x00007ffff6401f50 in listenerThreadProc(void*) () from /root/failing-jdk/lib/default/libj9jit29.so
#19 0x00007ffff747d820 in thread_wrapper () from /root/failing-jdk/lib/default/libj9thr29.so
#20 0x00007ffff7ef9718 in start_thread (arg=0x0) at pthread_create.c:479
#21 0x00007ffff7e0b778 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82

To reproduce, generate a cert/key pair with

openssl genrsa -out key.pem 2048

openssl req -new -x509 -sha256 -key key.pem -out cert.pem -days 365 -subj "/CN=localhost"

Then run jitserver -XX:JITServerSSLKey=key.pem -XX:JITServerSSLCert=cert.pem -Xjit:verbose={JITServer} to start the JITServer, and separately run java -XX:+UseJITServer -XX:JITServerSSLRootCerts=cert.pem -Xjit:verbose={JITServer} -version.

The -Xjit:verbose={JITServer} shows that the particular JDK I was using was built with OpenSSL 3.0.13 and was running with 3.0.7, as expected. I just grabbed one of the JDKs from a failing internal nightly build - I imagine this will fail with master as well.

cjjdespres commented 4 months ago

Actually, I don't think I tested building OpenSSL 3.0.7 from source on RHEL 8.8, so maybe that would work as well.

mpirvu commented 4 months ago

@ymanton Is that instruction lxvb16x 32, 0, 8 valid on older P machines?

cjjdespres commented 4 months ago

I grepped through the openssl repo and that instruction does not appear in the code in the openssl-3.0.7 tag. Maybe it's from an EPEL patch? (It does appear in master, in the file named by gdb).

ymanton commented 4 months ago

@ymanton Is that instruction lxvb16x 32, 0, 8 valid on older P machines?

It's only supported on POWER 9 and later. The machine getting SIGILL is P8.

I don't think you'd see this if you built OpenSSL from source on the target machine or installed a package available from the repos the machine has access to.

Did you get your EPEL package from epel-8 or epel-9? RHEL 9 dropped support for P8 so all packages for RHEL 9 are going to be built for P9 as the baseline.

cjjdespres commented 4 months ago

I got it from epel-8.

cjjdespres commented 4 months ago

I'm fairly sure it's epel-8, anyway. This is on RHEL 8.8. The exact package is openssl3-3.0.7-5.el8.1.src.rpm.

ymanton commented 4 months ago

Do you see the exact same crash outside of a debugger?

From what I can tell of the openssl code, it expects SIGILL in some places because that's how it dynamically discovers which instructions are supported by the host: https://github.com/openssl/openssl/blob/openssl-3.0.7/crypto/ppccap.c

That code will check the auxvector^1 if it was built with that support, otherwise fall back to testing with SIGILL. You can check the contents of the auxvector by running something with LD_SHOW_AUXV=1 in your env, e.g.:

$ LD_SHOW_AUXV=1 openssl3 version | grep AT_HWCAP
AT_HWCAP:        true_le archpmu vsx arch_2_06 dfp ic_snoop smt mmu fpu altivec ppc64 ppc32
AT_HWCAP2:       vcrypto tar isel ebb dscr arch_2_07

The only discrepancy is that I don't see any of the functions in that code in your backtrace, and since you appear to have debug symbols available they wouldn't be hidden by inlining for example. Your backtrace looks like an operation is actually in progress. It could a bug, but I would think this stuff would have been thoroughly tested.

If you run it in gdb it will catch SIGILL so you should try outside of the debugger or get gdb to pass the signal on^2 to the program and see what happens.

ymanton commented 4 months ago

Actually don't bother with all that work, your problem looks like exactly this bug: https://github.com/openssl/openssl/issues/19163

mpirvu commented 4 months ago

If I understand correctly, openssl 3.0.7 on Power has a bug which prevents us from running JITServer with encryption on Power 8 machines.

cjjdespres commented 4 months ago

If I understand correctly, openssl 3.0.7 on Power has a bug which prevents us from running JITServer with encryption on Power 8 machines.

The openssl 3.0.7 package that was recently installed on the rhel8/cent8 power test machines, at least.

pshipton commented 4 months ago

Related to https://github.ibm.com/runtimes/infrastructure/issues/9199

ymanton commented 4 months ago

If I understand correctly, openssl 3.0.7 on Power has a bug which prevents us from running JITServer with encryption on Power 8 machines.

Yes, it looks like it was fixed upstream in 3.3 and backported to 3.1, but I'm not sure why 3.0 didn't get it.

cjjdespres commented 4 months ago

Just to record it here - the problem seems to be that the epel-8 OpenSSL package (version 3.0.7) was patched to include some performance improvements for power 9 and up introduced into OpenSSL 3.1 and later versions, but was not patched to include a later bug fix that prevents this issue from occurring. @ymanton mentioned that he would raise an issue with Red Hat to try to resolve the problem.

ymanton commented 4 months ago

https://bugzilla.redhat.com/show_bug.cgi?id=2276143

ymanton commented 4 months ago

It looks like there is a openssl3-3.2.1-1.1.el8 package in the pipeline that should have a fix for this.

@cjjdespres you can download it here and test it on a VM if you're inclined.

JasonFengJ9 commented 4 months ago

JDK21 ppc64le_linux(rhel8le-rtp-rt1-1)

[2024-04-22T14:48:37.021Z] variation: NoOptions
[2024-04-22T14:48:37.021Z] JVM_OPTIONS:  

[2024-04-22T14:49:39.290Z]  [ERR]  (warm) Compiling com/ibm/jit/JITHelpers.classIsInterfaceFlag()I  OrdinaryMethod j9m=00000000000BCFB8 remote t=0 compThreadID=0 memLimit=262144 KB freePhysicalMemory=4620 MB
[2024-04-22T14:49:39.290Z]  [ERR] JVMDUMP039I Processing dump event "abort", detail "" at 2024/04/22 07:49:32 - please wait.
[2024-04-22T14:49:39.290Z]  [ERR] JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17137964409796/testJitserverArguments_0/core.20240422.074932.281214.0001.dmp' in response to an event
[2024-04-22T14:49:39.290Z]  [ERR] JVMDUMP010I System dump written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17137964409796/testJitserverArguments_0/core.20240422.074932.281214.0001.dmp
[2024-04-22T14:49:39.290Z]  [ERR] JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17137964409796/testJitserverArguments_0/javacore.20240422.074932.281214.0002.txt' in response to an event
[2024-04-22T14:49:39.290Z]  [ERR] #JITSTATE:  t=  1516 JIT changed state from STARTUP   to IDLE      cSmpl=  0 iSmpl=  0 comp=  0 recomp=  0, Q_SZ=  0 CLP=OFF jvmCPU=0%
[2024-04-22T14:49:39.290Z]  [ERR] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17137964409796/testJitserverArguments_0/javacore.20240422.074932.281214.0002.txt
[2024-04-22T14:49:39.290Z]  [ERR] JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17137964409796/testJitserverArguments_0/Snap.20240422.074932.281214.0003.trc' in response to an event
[2024-04-22T14:49:39.290Z]  [ERR] JVMDUMP010I Snap dump written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17137964409796/testJitserverArguments_0/Snap.20240422.074932.281214.0003.trc
[2024-04-22T14:49:39.290Z]  [ERR] JVMDUMP032I JVM requested JIT dump using '/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17137964409796/testJitserverArguments_0/jitdump.20240422.074932.281214.0004.dmp' in response to an event
[2024-04-22T14:49:39.290Z]  [ERR] JVMDUMP051I JIT dump occurred in '(unnamed thread)' thread 0x000000000021C900
[2024-04-22T14:49:39.290Z]  [ERR] JVMDUMP010I JIT dump written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_ppc64le_linux_testList_1/aqa-tests/TKG/output_17137964409796/testJitserverArguments_0/jitdump.20240422.074932.281214.0004.dmp
[2024-04-22T14:49:39.290Z]  [ERR] JVMDUMP013I Processed dump event "abort", detail "".
[2024-04-22T14:49:39.290Z]  [ERR] 70F1AD62FF7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:321:
[2024-04-22T14:49:39.290Z]  [ERR] #FAILURE:  JITServer::StreamFailure: Failed to SSL_connect for com/ibm/jit/JITHelpers.classIsInterfaceFlag()I @ warm

[2024-04-22T14:51:41.312Z] ---TEST RESULTS---
[2024-04-22T14:51:41.312Z] Number of PASSED tests: 7 out of 9
[2024-04-22T14:51:41.312Z] Number of FAILED tests: 2 out of 9
[2024-04-22T14:51:41.312Z] 
[2024-04-22T14:51:41.312Z] ---SUMMARY OF FAILED TESTS---
[2024-04-22T14:51:41.312Z] Test SSL success condition
[2024-04-22T14:51:41.312Z] Test SSL Failure Case with mismatched certificate
[2024-04-22T14:51:41.312Z] -----------------------------
[2024-04-22T14:51:41.312Z] 
[2024-04-22T14:51:41.312Z] -----------------------------------
[2024-04-22T14:51:41.312Z] testJitserverArguments_0_FAILED
cjjdespres commented 4 months ago

There have been failures in jdk_security2_1 in the JITServer nightly tests that I think may be due to this as well, e.g. in https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk11_j9_sanity.openjdk_ppc64le_linux_jit_Personal/1005/ there are crashes that look like:

03:50:33          TestAESWithDefaultProvider
03:50:33  stderr:
03:50:33  Unhandled exception
03:50:33  Type=Illegal instruction vmState=0x00040000
03:50:33  J9Generic_Signal_Number=00000048 Signal_Number=00000004 Error_Value=00000000 Signal_Code=00000001
03:50:33  Handler1=00007FFFB09467C0 Handler2=00007FFFB086D580
03:50:33  R0=00007FFF52B4C530 R1=00007FFF527F9F70 R2=00007FFF52D07F00 R3=00007FFFAAA3BF00
03:50:33  R4=00007FFFAA9EDCD0 R5=0000000000027100 R6=00007FFF0814D0F0 R7=00007FFF0814CF28
03:50:33  R8=00007FFF0814CF68 R9=00000000000001B0 R10=0000000000000002 R11=FFFF8000F7EB30B8
03:50:33  R12=0000000000004400 R13=00007FFF528068E0 R14=00007FFF0800B530 R15=00007FFFAC63F700
03:50:33  R16=00007FFF527FA46C R17=0000000000000000 R18=0000000000000010 R19=0000000000027100
03:50:33  R20=00007FFFAAA3BF00 R21=00007FFF0814CF68 R22=00007FFF0814D0F0 R23=0000000100000000
03:50:33  R24=00007FFFAA9EDCD0 R25=00007FFFAAA3BF00 R26=0000000000027100 R27=0000000000000000
03:50:33  R28=0000000000000000 R29=0000000000000000 R30=0000000000002710 R31=00007FFF0814CF28
03:50:33  NIP=00007FFF52A7452C MSR=800000000288F033 ORIG_GPR3=00007FFF52B4C52C CTR=00007FFF52B4C8A0
03:50:33  LINK=00007FFF52B4C530 XER=0000000020000000 CCR=0000000048224444 SOFTE=0000000000000001
03:50:33  TRAP=0000000000000700 DAR=00007FFF4A870000 dsisr=0000000040000000 RESULT=0000000000000000
03:50:33  FPR0 00007fffb1001378 (f: 2969572096.000000, d: 6.953290e-310)
03:50:33  FPR1 000000000002913d (f: 168253.000000, d: 8.312803e-319)
03:50:33  FPR2 3fefac6299001d77 (f: 2566921472.000000, d: 9.897931e-01)
03:50:33  FPR3 3fee666660000000 (f: 1610612736.000000, d: 9.500000e-01)
03:50:33  FPR4 3fce840b4ac4e4d2 (f: 1254417664.000000, d: 2.384047e-01)
03:50:33  FPR5 bfe7154748bef6c8 (f: 1220474624.000000, d: -7.213475e-01)
03:50:33  FPR6 3fd27616c9496e0b (f: 3377032704.000000, d: 2.884576e-01)
03:50:33  FPR7 3ff71547652ab82b (f: 1697298432.000000, d: 1.442695e+00)
03:50:33  FPR8 3fdec70a6ca7badd (f: 1822931712.000000, d: 4.808985e-01)
03:50:33  FPR9 3e7798867c25010b (f: 2082799872.000000, d: 8.790121e-08)
03:50:33  FPR10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR11 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
03:50:33  FPR12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR13 000000005359e85e (f: 1398401152.000000, d: 6.909020e-315)
03:50:33  FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:50:33  Module=/lib64/libcrypto.so.3
03:50:33  Module_base_address=00007FFF52800000
03:50:33  Target=2_90_20240430_1726 (Linux 4.18.0-513.24.1.el8_9.ppc64le)
03:50:33  CPU=ppc64le (4 logical CPUs) (0x1dcc60000 RAM)
03:50:33  ----------- Stack Backtrace -----------
03:50:33   (0x00007FFF52A7452C [libcrypto.so.3+0x27452c])
03:50:33   (0x00007FFF52B4C530 [libcrypto.so.3+0x34c530])
03:50:33   (0x00007FFF52B4CA50 [libcrypto.so.3+0x34ca50])
03:50:33   (0x00007FFF52BA8F94 [libcrypto.so.3+0x3a8f94])
03:50:33   (0x00007FFF52BAA064 [libcrypto.so.3+0x3aa064])
03:50:33  EVP_EncryptUpdate+0xac (0x00007FFF52A1DB0C [libcrypto.so.3+0x21db0c])
03:50:33  EVP_CipherUpdate+0x24 (0x00007FFF52A1E754 [libcrypto.so.3+0x21e754])
03:50:33  Java_jdk_crypto_jniprovider_NativeCrypto_GCMEncrypt+0x2ac (0x00007FFF52F457BC [libjncrypto.so+0x57bc])
03:50:33   (0x00007FFFB0B65728 [libj9vm29.so+0x265728])
03:50:33  ffi_call_int+0xd4 (0x00007FFFB0B653F4 [libj9vm29.so+0x2653f4])
03:50:33  bytecodeLoopFull+0x13514 (0x00007FFFB09F7944 [libj9vm29.so+0xf7944])
03:50:33   (0x00007FFFB0ACFEA0 [libj9vm29.so+0x1cfea0])
03:50:33  sidecarInvokeReflectMethodImpl+0x3c0 (0x00007FFFB091FF40 [libj9vm29.so+0x1ff40])
03:50:33  sidecarInvokeReflectMethod+0x38 (0x00007FFFB09213D8 [libj9vm29.so+0x213d8])
03:50:33  JVM_InvokeMethod_Impl+0xb4 (0x00007FFFAAB53164 [libjclse29.so+0x83164])
03:50:33  JVM_InvokeMethod+0x40 (0x00007FFFB0FC4020 [libjvm.so+0x24020])
03:50:33  JVM_InvokeMethod+0x34 (0x00007FFFB1096AF4 [libjvm.so+0x6af4])
03:50:33  Java_jdk_internal_reflect_NativeMethodAccessorImpl_invoke0+0x24 (0x00007FFF539DD174 [libjava.so+0xd174])
03:50:33   (0x00007FFF71506620 [<unknown>+0x0])
03:50:33  runJavaThread+0x340 (0x00007FFFB091AD40 [libj9vm29.so+0x1ad40])
03:50:33  javaProtectedThreadProc+0xf0 (0x00007FFFB09B88D0 [libj9vm29.so+0xb88d0])
03:50:33  omrsig_protect+0x358 (0x00007FFFB086E988 [libj9prt29.so+0x3e988])
03:50:33  javaThreadProc+0x64 (0x00007FFFB09B3AA4 [libj9vm29.so+0xb3aa4])
03:50:33  thread_wrapper+0x1a8 (0x00007FFFB08013E8 [libj9thr29.so+0x113e8])
03:50:33  start_thread+0xf8 (0x00007FFFB1369718 [libpthread.so.0+0x9718])
03:50:33  clone+0x74 (0x00007FFFB120B778 [libc.so.6+0x13b778])

You can see that it's crashing in libcrypto, and that particular test ran on rtj-rhel8le-svl-test-d8dvh-1.fyre.ibm.com.

ymanton commented 4 months ago

openssl3-3.2.1-1.1.el8 is available on EPEL and is supposed to fix this.

cjjdespres commented 3 months ago

These failures seem to have disappeared from the nightly builds, so it looks like @mpirvu was right about this being fixed. (I can't quite remember what the resolution was in the end, though). I think this can be closed for now.