Open connglli opened 1 year ago
@jmesyou : can you investigate this intermittent crash in the optimizer please? The user attached a standalone test case and some failure artifacts to the issue.
Moving this out to 0.41 to accommodate resource schedules.
Unable to reproduce this failure yet
@jmesyou Perhaps check the log file I've put into the links.
@connglli I'm able to reproduce the error on the same commits you reported:
openjdk version "11.0.20-internal" 2023-07-18
OpenJDK Runtime Environment (build 11.0.20-internal+0-adhoc..openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build master-8aa8676, JRE 11 Linux amd64-64-Bit Compressed References 20230512_000000 (JIT enabled, AOT enabled)
OpenJ9 - 8aa8676
OMR - 779c51b
JCL - ee54452 based on jdk-11.0.20+2)
But, so far, still unable to reproduce the exceptions on HEAD. Perhaps some of these failures were fixed, will investigate further 🤔
Thanks @jmesyou. If you cannot reproduce on HEAD, I suppose this bug is fixed by some commits in between? But anyway, I'll check them again when I'm available (and sorry I'm on my vacation so I'm not that on call). I'll consider closing this issue and deem it as fixed if I cannot reproduce them either.
Perhaps we can try git bisect
to find the exact commit that might fix this issue (even though it's a little bit time-consuming) if we cannot reproduce it.
Hi @connglli, I'm going to close the issue since it's stale. Efforts to reproduce it have not been successful on my end. If you find that the issue persists, please feel free to reopen this issue.
Sure it's okay, perhaps it's already fixed. I'll reopen it once I can reproduced it again.
@jmesyou, I was able to reproduce this failure with source as of a week ago, so I'm going to reopen this one.
Some of these issues can be very difficult to reproduce - one person sees repeated failures, and another cannot reproduce it.
I've uploaded a jitdump that was produced during one of the crashes that I saw. I believe the problem occurs after Tree Simplifier eliminates a switch
for which its able to determine the value.
After that optimization, I think there are problems with the structure for the method, although the CFG looks correct to me. In particular, 4 appears as both a block and an acyclic region within region 0, and the acyclic region version of 4 is seen as having no successor.
<structure>
0 [0x7f4ef7204960] Acyclic region
Subgraph: (* = exit edge)
(0x7f4ef7204a30:0x7f4ef7202280)0 --> 2(0x7f4ef7204ac0)
(0x7f4ef7204ac0:0x7f4ef72021c0)2 --> 36(0x7f4ef7204c70) 29(0x7f4ef7204ba0)
(0x7f4ef7204ba0:0x7f4ef7201b00)29 --> 1(0x7f4ef7204d60)
(0x7f4ef7204d60:0x7f4ef7202220)1 -->
(0x7f4ef7204c70:0x7f4ef7201a40)36 --> 4(0x7f4ef7204e70)
(0x7f4ef7204e70:0x7f4ef7202f20)4 -->
(0x7f4ef72051b0:0x7f4ef7201980)4 --> 1(0x7f4ef7204d60)
0 [0x7f4ef7202280] Block
2 [0x7f4ef72021c0] Block
29 [0x7f4ef7201b00] Block
1 [0x7f4ef7202220] Block
36 [0x7f4ef7201a40] Block
4 [0x7f4ef7202f20] Acyclic region
Subgraph: (* = exit edge)
4 [0x7f4ef7201980] Block
</structure>
I've seen the crash occur in various places, but in each case I see a structure like this after one of the passes of Tree Simplifier, so I don't think a core file would be of much use.
@jmesyou, you might want to try looking at how the structure changes as the various dead cases in the TR::table
are eliminated. If I'm understanding correctly, I think we end up with something where the Improper Region that contains the table
eventually has its entry node being followed only by an exit from the region - the one case that the table
will execute. At some point after that in Tree Simplifier, the entry node for the region is merged with that exit node, and that results in the entry node no longer being part of the region.
I don't know for certain whether the fact that the entry is no longer part of the region is what ends up causing trouble later, or how that situation is ordinarily handled in updating structures, but it might be something to look at.
With more recent builds, I found I needed to run with the environment variable TR_EnableExpensiveOptsAtWarm=1
set.
It seems that running lastLoopVersioner
sets the stage for the problem to be exposed, but since pull request #18682 was merged, that optimization will not usually be run at warm. Setting the TR_EnableExpensiveOptsAtWarm
environment variable forces lastLoopVersioner
to be run at the warm optLevel.
This is a crash, so I didn't put it into PR17404. I wanted to track it seperatly.
Java version
The same version as in PR17404.
Javac version
Code and summary of the problem
A JIT bug, not deterministic.
See tests and diagnostic files in issue17419.tar.gz.
Also, the test (Test.java) is a bit long and cannot be deterministically reduced. It there're multiple issues found, let's start new issues. Thanks!
Sample segfaults
Segfaults in Simplifier:
Segfaults in Simplifier (different from the last one):
Segfaults in Liveness Analysis: