Open knn-k opened 4 years ago
I built x86-64 Linux runtimes to try this. Neither the compressed refs build nor the large heap build reproduced the assertion failure.
@knn-k how did you build this? This is a TR_ASSERT
which is not active during release builds. Are you building a debug build?
Yes, I am building debug builds locally.
@0xdaryl says he was able to reproduce the failure on aarch64 with a release build by changing the TR_ASSERT()
to TR_ASSERT_FATAL()
.
I am wondering why I see this IL failure only on aarch64 Linux but not on x86_64 Linux.
The assertion failure occurs during the loopVersioner optimization. It disappears by adding disableLoopVersioner
to the -Xjit:
option.
<optimization id=47 name=loopVersioner method=jdk/internal/module/ModuleBootstrap.boot()Ljava/lang/ModuleLayer;>
Performing 47: loopVersioner
[ 417] O^O LOOP VERSIONER: Versioning natural loop 84
(Invalidating structure)
It occurs only with optlevel=warm
. It runs without the failure when I use optlevel=cold
, hot
, scorching
.
@knn-k can you reduce down to the last subindex:
-Xjit:{jdk/internal/module/ModuleBootstrap.boot()Ljava/lang/ModuleLayer;}(traceFull,traceLoopVersioner,log=boot.trace,lastOptIndex=47,lastOptTransformationIndex=x)`
replacing x with a transformation number (start at something like 64) and binary searching all the way down to 0 to find which sub-index is causing the problem. Once we find the subindex, let's generate a passing log by using x - 1 and a failing log using x and upload it to this issue. We can then take a look to see if the transformation is valid.
Today I recreated the assertion failure on x86_64 Linux, and generated the attached trace files by running with the following command. The boundary of the subindex is between 422 and 423.
$ jdk/bin/java -Xjit:optlevel=warm,count=0,limit='{jdk/internal/module/ModuleBootstrap.boot()Ljava/lang/ModuleLayer;}',traceFull,traceLoopVersioner,log=boot-trace-423.txt,lastOptIndex=47,lastOptTransformationIndex=423
I don't see any negative frequencies when we enter loop versioner, so we must have calculated something incorrectly during the optimization. @knn-k is the test which reproduces this issue self-contained? Is it something we can upload to the issue here? @andrewcraik can suggest an expert in the area to help investigate. We'll likely need to add more tracing into the code to figure out which block we're choking on and to find out how the negative frequency was calculated.
Can we just get a backtrace (preferably with file and line numbers) when the failure occurs? That will make it easy to see what is wrong in the code. I don't see obviously suspect math so knowing the source will help.
@fjeremic I just run java -Xjit:optlevel=warm,count=0,limit=boot
specifying no class to run as I wrote in the original post. I get the message starting with "Usage: java [options]
@andrewcraik Please find the backtrace below (line numbers unavailable). I modified TR_ASSERT()
in createEmptyBlock()
to TR_ASSERT_FATAL()
in this x86_64 build:
#3 0x00007fffef71a7a8 in TR::assertion(char const*, int, char const*, char const*, ...) ()
from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#4 0x00007fffef6e441f in OMR::Block::createEmptyBlock(TR::Node*, TR::Compilation*, int, TR::Block*) ()
from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#5 0x00007fffef8b86f4 in TR_LoopVersioner::versionNaturalLoop(TR_RegionStructure*, List<TR::Node>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::Node>*, List<TR_NodeParentSymRef>*, List<TR_NodeParentSymRefWeightTuple>*, List<TR_Structure>*, List<TR_Structure>*, bool, CS2::ASparseBitVector<CS2::shared_allocator<CS2::heap_allocator<65536ul, 12u, TRMemoryAllocator<(TR_AllocationKind)1, 12u, 28u> > > >&) ()
from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#6 0x00007fffef8bdec9 in TR_LoopVersioner::performWithoutDominators() ()
from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#7 0x00007fffef8d7612 in OMR::Optimizer::performOptimization(OptimizationStrategy const*, int, int, int) ()
from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#8 0x00007fffef8d79ca in OMR::Optimizer::performOptimization(OptimizationStrategy const*, int, int, int) ()
from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#9 0x00007fffef8d90cb in OMR::Optimizer::optimize() ()
from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#10 0x00007fffef6aeadd in OMR::Compilation::compile() ()
from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#11 0x00007fffef3a9cf7 in TR::CompilationInfoPerThreadBase::compile(J9VMThread*, TR::Compilation*, TR_ResolvedMethod*, TR_J9VMBase&, TR_OptimizationPlan*, TR::SegmentAllocator const&) ()
from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#12 0x00007fffef3aabb0 in TR::CompilationInfoPerThreadBase::wrappedCompile(J9PortLibrary*, void*) ()
from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
Backtrace on aarch64 with debug information:
#4 TR::assertion (
file=file@entry=0x7fb6b53d78 "/root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/il/OMRBlock.cpp", line=line@entry=277, condition=condition@entry=0x7fb6b540e0 "(frequency >= 0)",
format=format@entry=0x7fb6b540b8 "Block frequency must be non negative\n")
at /root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/infra/Assert.cpp:157
#5 0x0000007fb66e1b90 in OMR::Block::createEmptyBlock (n=n@entry=0x7f9339b220, comp=0x7f93000000, frequency=-1,
block=block@entry=0x7f934b3110)
at /root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/il/OMRBlock.cpp:277
#6 0x0000007fb688a290 in TR_LoopVersioner::versionNaturalLoop (this=this@entry=0x7f933d0020,
whileLoop=whileLoop@entry=0x7f93429b50, nullCheckedReferences=0x7fb48e3cc8,
nullCheckedReferences@entry=0x7fb671c2a0 <TR::AllBlockIterator::stepForward()+240>, nullCheckTrees=0x7fb48e3cd8,
nullCheckTrees@entry=0x7fb6cdb000, boundCheckTrees=boundCheckTrees@entry=0x7fb48e3cf8,
spineCheckTrees=0x7fb48e3d08, spineCheckTrees@entry=0x7f93017220,
conditionalTrees=conditionalTrees@entry=0x7fb48e3d28, divCheckTrees=divCheckTrees@entry=0x7fb48e3d38,
awrtbariTrees=0x7fb48e3d68, awrtbariTrees@entry=0x7fb48e6250, checkCastTrees=0x7fb48e3d48,
checkCastTrees@entry=0x5a0000, arrayStoreCheckTrees=0x7fb48e3d58, arrayStoreCheckTrees@entry=0x7f93000000,
specializedNodes=specializedNodes@entry=0x7fb48e3d78, invariantNodes=invariantNodes@entry=0x7fb48e3d88,
invariantTranslationNodesList=0x7fb48e3d98, invariantTranslationNodesList@entry=0x7fb48e6278,
innerWhileLoops=0x7fb48e3cb8, innerWhileLoops@entry=0x0, clonedInnerWhileLoops=0x7fb48e3dd8,
clonedInnerWhileLoops@entry=0x7fb48e6308, skipVersioningAsynchk=false, skipVersioningAsynchk@entry=32,
reverseBranchInLoops=...)
at /root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/optimizer/LoopVersioner.cpp:3565
#7 0x0000007fb688ebd4 in TR_LoopVersioner::performWithoutDominators (this=0x7f933d0020)
at /root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/optimizer/LoopVersioner.cpp:654
#8 0x0000007fb68a6040 in OMR::Optimizer::performOptimization (this=this@entry=0x7f930d0a00,
optimization=optimization@entry=0x7fb6b80a18 <lastLoopVersionerOpts+16>, firstOptIndex=firstOptIndex@entry=0,
lastOptIndex=lastOptIndex@entry=2147483647, doTiming=doTiming@entry=0)
at /root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/optimizer/OMROptimizer.cpp:2061
#9 0x0000007fb68a6c2c in OMR::Optimizer::performOptimization (this=this@entry=0x7f930d0a00,
optimization=optimization@entry=0x7fb6b210e0 <cheapWarmStrategyOpts+112>, firstOptIndex=firstOptIndex@entry=0,
lastOptIndex=lastOptIndex@entry=2147483647, doTiming=doTiming@entry=0)
at /root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/optimizer/OMROptimizer.cpp:1607
I still can recreate the assertion failure with x86_64 Linux and aarch64 today. What can I do with this?
I still can recreate this assertion failure with a debug build on AArch64. (I just haven't tried x86-64 today).
Brad @BradleyWood, may I ask you to look at this problem?
I reproduced this assertion failure on x86-64 Linux using sanity.functional last month. See #17616.
I see the following assertion failure in
OMR::Block::createEmptyBlock()
in OMR compiler/il/OMRBlock.cpp with my debug build, Java 11 for AArch64 Linux. It disappears when I useoptlevel=cold
.It looks like a platform-independent problem. It is a
TR_ASSERT()
and you can reproduce it with debug builds only.The Java method is related to the module system, so I don't think it is reproducible with Java 8.
My local Feb. 14 debug build fails while the Feb. 10 debug build is OK. I think something has changed last week.