eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 720 forks source link

Crash in MauveSingleInvocationLoadTest_special_22 #10665

Open liqunl opened 4 years ago

liqunl commented 4 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_x86-64_windows_Nightly_mauveLoadTest/134 MauveSingleInvocationLoadTest_special_22 Options are: -Xcompressedrefs -Xjit:count=0 -Xgcpolicy:gencon -Xaggressive -Xconcurrentlevel0

LT  23:12:45.733 - Completed 80.1%. Number of tests started=2256
LT  stderr Unhandled exception
LT  stderr Type=Segmentation error vmState=0x00000000
LT  stderr Windows_ExceptionCode=c0000005 J9Generic_Signal=00000004 ExceptionAddress=00007FFA4ED40052 ContextFlags=0010005f
LT  stderr Handler1=00007FFA5024FD00 Handler2=00007FFA50168C50 InaccessibleReadAddress=FFFFFFFFFFFFFFFF
LT  stderr RDI=00007FFA3BBE85C3 RSI=00007FFA3BBE85C8 RAX=FFBC1C10FFBC1C00 RBX=0000000001DA3100
LT  stderr RCX=00000000FFBC1900 RDX=00007FFA3BBE85C8 R8=0000000000000000 R9=0000000001DA3500
LT  stderr R10=00000000013EFFF0 R11=00000000FFFF0000 R12=00000000FFFD13E0 R13=0000000000000010
LT  stderr R14=0000000000000000 R15=00000000FFFD1390
LT  stderr RIP=00007FFA4ED40052 RSP=0000000001A0C660 RBP=0000000001A05000 GS=002B
LT  stderr FS=0053 ES=002B DS=002B
LT  stderr XMM0 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM1 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM2 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM8 3fdfd535dd2acfe8 (f: 3710570496.000000, d: 4.973883e-001)
LT  stderr XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr Module=C:\Users\jenkins\workspace\Test_openjdk8_j9_special.system_x86-64_windows_Nightly_mauveLoadTest\openjdkbinary\j2sdk-image\jre\bin\compressedrefs\j9jit29.dll
LT  stderr Module_base_address=00007FFA4EBB0000 Offset_in_DLL=0000000000190052
LT  stderr Target=2_90_20200921_521 (Windows Server 2012 R2 6.3 build 9600)
LT  stderr CPU=amd64 (8 logical CPUs) (0x1ffb9c000 RAM)
LT  stderr ----------- Stack Backtrace -----------
LT  stderr Java_java_lang_invoke_MutableCallSite_invalidate+0xf6ad2 (0x00007FFA4ED40052 [j9jit29+0x190052])
LT  stderr (0x00000000FFBC1900)
LT  stderr (0x00000000FFBC1900)
LT  stderr (0x00000000FFFD0040)
LT  stderr (0x00000000FFFD0060)
LT  stderr Java_java_lang_invoke_MutableCallSite_invalidate+0x5df750 (0x00007FFA4F228CD0 [j9jit29+0x678cd0])
LT  stderr (0x00007FFA3BBE85C8)
LT  stderr (0x00007FFA3BB39A14)
LT  stderr (0x00007FFA3BB39A14)
LT  stderr (0x00000000FFBC1798)
LT  stderr ---------------------------------------
liqunl commented 4 years ago

Grinder from @rpshukla in #10623

10x grinder for MauveSingleInvocationLoadTest_special_22 passed: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3951/

running more grinders: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3952/ https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3953/

mayshukla commented 4 years ago

This grinder https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3953/ was supposed to be for the original failure in #10623 (although it looks like I didn't get the settings correct and no tests were actually run)

The others are for MauveSingleInvocationLoadTest_special_22

mayshukla commented 4 years ago

Haven't seen a failure yet. Here are a couple more grinders:

https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3954/ 20/20 passed https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3955/ 20/20 passed https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3956/ 20/20 passed

liqunl commented 4 years ago

Could be related to #10365, will update with my finding later.

liqunl commented 4 years ago

It seems to be different than #10365, there are different problems show up in the core dump. It is a crash in populateVPicSlotCall, when we're trying to get j9method from vtable of a class. The object reference passed to the helper is bad, thus the j9class we got from it is also bad. Looking at its caller, there are the following problems

  1. Object reference seemed to become stale in places without gc points
  2. The VPic slot is partially patched (we have populated the class, but the call to helper is still there), but we crashed at a point the patching hasn't happened. I suspect there are another thread running at the same time
  3. The populated class is also bad, and is the same to the bad object reference. Maybe another thread also ran with bad reference.

I'll post more detailed investigation in another comment later.

I haven't figured out why so many things are wrong at the same time. Given the options -Xjit:count=0 -Xgcpolicy:gencon -Xaggressive -Xconcurrentlevel0 and that we can't reproduce it in 80 runs, I would recommend we defer it to next release. @andrewcraik

liqunl commented 3 years ago

I haven't made any progress on this issue. Given the time left and other issues I'm working on, I suggest we defer this to next release

pshipton commented 3 years ago

@liqunl I'm not sure why this is in the milestone plan at all. It's not a failure I see in the nightly builds so it can't be happening often. I suspect it was added only because https://github.com/eclipse/openj9/issues/10623 was in the milestone plan. Unless you have some reason to keep it in, I'll remove it rather than moving it forward.

0xdaryl commented 3 years ago

OK, I think we can remove the milestone target.

pshipton commented 3 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_x86-64_mac_mixed_Nightly_testList_0/5 MauveSingleInvocLoad_special_5m_22 variation: Mode688 JVM_OPTIONS: -Xcompressedrefs -Xjit:count=0 -Xgcpolicy:gencon -Xaggressive -Xconcurrentlevel0 https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Test/Test_openjdk8_j9_special.system_x86-64_mac_mixed_Nightly_testList_0/5/system_test_output.tar.gz

LT  stderr Unhandled exception
LT  stderr Type=Segmentation error vmState=0x00000000
LT  stderr J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
LT  stderr Handler1=0000000004036240 Handler2=0000000003798860 InaccessibleAddress=0000000000000019
LT  stderr RDI=000000001D1640B3 RSI=000000001D1640B8 RAX=0000000000000001 RBX=00000000FFFEF708
LT  stderr RCX=00000000FFB59C00 RDX=000000001D1640B8 R8=00000000FFB59C00 R9=00000000FFFEEC20
LT  stderr R10=0000000080AEA1E0 R11=00007F80E0003890 R12=0000000000000000 R13=00000000FFFEF728
LT  stderr R14=00000000FFFEF6B8 R15=00000000FFFEF6C8
LT  stderr RIP=000000000476F668 GS=0000 FS=0000 RSP=000000000FD28F80
LT  stderr RFlags=0000000000010203 CS=002B RBP=000000000FD21900 ERR=0000001900000004
LT  stderr TRAPNO=000000040000000E CPU=0019000000040000 FAULTVADDR=0000000000000019
LT  stderr XMM0 0000000000000002 (f: 2.000000, d: 9.881313e-324)
LT  stderr XMM1 000000000fd28ff8 (f: 265457664.000000, d: 1.311535e-315)
LT  stderr XMM2 00000000df000000 (f: 3741319168.000000, d: 1.848457e-314)
LT  stderr XMM3 3e12930f252766de (f: 623339200.000000, d: 1.081175e-09)
LT  stderr XMM4 3eb26186b931f159 (f: 3107057920.000000, d: 1.095591e-06)
LT  stderr XMM5 00000000003c0000 (f: 3932160.000000, d: 1.942745e-317)
LT  stderr XMM6 3fd947941c2116fb (f: 471930624.000000, d: 3.949938e-01)
LT  stderr XMM7 40262e42fefa39ef (f: 4277811712.000000, d: 1.109035e+01)
LT  stderr XMM8 0000000041800000 (f: 1098907648.000000, d: 5.429325e-315)
LT  stderr XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
LT  stderr XMM10 0000000000000006 (f: 6.000000, d: 2.964394e-323)
LT  stderr XMM11 0000000000000008 (f: 8.000000, d: 3.952525e-323)
LT  stderr XMM12 bf9f518686afffa1 (f: 2259681280.000000, d: -3.058443e-02)
LT  stderr XMM13 3f9431defbd6fb77 (f: 4225170176.000000, d: 1.972149e-02)
LT  stderr XMM14 bf705fca77713fdd (f: 2003910656.000000, d: -3.997603e-03)
LT  stderr XMM15 3f14d74035fec9a7 (f: 905890240.000000, d: 7.950143e-05)
LT  stderr Module=/Users/jenkins/workspace/Test_openjdk8_j9_special.system_x86-64_mac_mixed_Nightly_testList_0/openjdkbinary/j2sdk-image/jre/lib/default/libj9jit29.dylib
LT  stderr Module_base_address=00000000044AB000 Symbol=..@109.done
LT  stderr Symbol_address=000000000476F650
LT  stderr Target=2_90_20210207_6 (Mac OS X 10.13.6)
LT  stderr CPU=amd64 (4 logical CPUs) (0x200000000 RAM)
LT  stderr ----------- Stack Backtrace -----------
LT  stderr ---------------------------------------

@liqunl do recent updates to the jitdump help with his?