eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

Assertion failure in OMR::Block::createEmptyBlock() #8594

Open knn-k opened 4 years ago

knn-k commented 4 years ago

I see the following assertion failure in OMR::Block::createEmptyBlock() in OMR compiler/il/OMRBlock.cpp with my debug build, Java 11 for AArch64 Linux. It disappears when I use optlevel=cold.

$ jdk/bin/java -Xjit:optlevel=warm,count=0,limit=boot
Assertion failed at /root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/il/OMRBlock.cpp:277: (frequency >= 0)
VMState: 0x00050fff
        Block frequency must be non negative
compiling jdk/internal/module/ModuleBootstrap.boot()Ljava/lang/ModuleLayer; at level: warm

It looks like a platform-independent problem. It is a TR_ASSERT() and you can reproduce it with debug builds only.

   if (!comp->isPeekingMethod() && (comp->getFlowGraph()->getMaxFrequency() >= 0))
      {
      TR_ASSERT((frequency >= 0), "Block frequency must be non negative\n");
      }

The Java method is related to the module system, so I don't think it is reproducible with Java 8.

My local Feb. 14 debug build fails while the Feb. 10 debug build is OK. I think something has changed last week.

knn-k commented 4 years ago

I built x86-64 Linux runtimes to try this. Neither the compressed refs build nor the large heap build reproduced the assertion failure.

fjeremic commented 4 years ago

@knn-k how did you build this? This is a TR_ASSERT which is not active during release builds. Are you building a debug build?

knn-k commented 4 years ago

Yes, I am building debug builds locally. @0xdaryl says he was able to reproduce the failure on aarch64 with a release build by changing the TR_ASSERT() to TR_ASSERT_FATAL().

I am wondering why I see this IL failure only on aarch64 Linux but not on x86_64 Linux.

knn-k commented 4 years ago

The assertion failure occurs during the loopVersioner optimization. It disappears by adding disableLoopVersioner to the -Xjit: option.

<optimization id=47 name=loopVersioner method=jdk/internal/module/ModuleBootstrap.boot()Ljava/lang/ModuleLayer;>
Performing 47: loopVersioner
[   417] O^O LOOP VERSIONER: Versioning natural loop 84
              (Invalidating structure)

It occurs only with optlevel=warm. It runs without the failure when I use optlevel=cold, hot, scorching.

fjeremic commented 4 years ago

@knn-k can you reduce down to the last subindex:

-Xjit:{jdk/internal/module/ModuleBootstrap.boot()Ljava/lang/ModuleLayer;}(traceFull,traceLoopVersioner,log=boot.trace,lastOptIndex=47,lastOptTransformationIndex=x)`

replacing x with a transformation number (start at something like 64) and binary searching all the way down to 0 to find which sub-index is causing the problem. Once we find the subindex, let's generate a passing log by using x - 1 and a failing log using x and upload it to this issue. We can then take a look to see if the transformation is valid.

knn-k commented 4 years ago

Today I recreated the assertion failure on x86_64 Linux, and generated the attached trace files by running with the following command. The boundary of the subindex is between 422 and 423.

$ jdk/bin/java -Xjit:optlevel=warm,count=0,limit='{jdk/internal/module/ModuleBootstrap.boot()Ljava/lang/ModuleLayer;}',traceFull,traceLoopVersioner,log=boot-trace-423.txt,lastOptIndex=47,lastOptTransformationIndex=423

issue8594.zip

fjeremic commented 4 years ago

I don't see any negative frequencies when we enter loop versioner, so we must have calculated something incorrectly during the optimization. @knn-k is the test which reproduces this issue self-contained? Is it something we can upload to the issue here? @andrewcraik can suggest an expert in the area to help investigate. We'll likely need to add more tracing into the code to figure out which block we're choking on and to find out how the negative frequency was calculated.

andrewcraik commented 4 years ago

Can we just get a backtrace (preferably with file and line numbers) when the failure occurs? That will make it easy to see what is wrong in the code. I don't see obviously suspect math so knowing the source will help.

knn-k commented 4 years ago

@fjeremic I just run java -Xjit:optlevel=warm,count=0,limit=boot specifying no class to run as I wrote in the original post. I get the message starting with "Usage: java [options] [args...]" when I change the optlevel to cold.

@andrewcraik Please find the backtrace below (line numbers unavailable). I modified TR_ASSERT() in createEmptyBlock() to TR_ASSERT_FATAL() in this x86_64 build:

#3  0x00007fffef71a7a8 in TR::assertion(char const*, int, char const*, char const*, ...) ()
   from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#4  0x00007fffef6e441f in OMR::Block::createEmptyBlock(TR::Node*, TR::Compilation*, int, TR::Block*) ()
   from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#5  0x00007fffef8b86f4 in TR_LoopVersioner::versionNaturalLoop(TR_RegionStructure*, List<TR::Node>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::TreeTop>*, List<TR::Node>*, List<TR_NodeParentSymRef>*, List<TR_NodeParentSymRefWeightTuple>*, List<TR_Structure>*, List<TR_Structure>*, bool, CS2::ASparseBitVector<CS2::shared_allocator<CS2::heap_allocator<65536ul, 12u, TRMemoryAllocator<(TR_AllocationKind)1, 12u, 28u> > > >&) ()
   from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#6  0x00007fffef8bdec9 in TR_LoopVersioner::performWithoutDominators() ()
   from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#7  0x00007fffef8d7612 in OMR::Optimizer::performOptimization(OptimizationStrategy const*, int, int, int) ()
   from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#8  0x00007fffef8d79ca in OMR::Optimizer::performOptimization(OptimizationStrategy const*, int, int, int) ()
   from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#9  0x00007fffef8d90cb in OMR::Optimizer::optimize() ()
   from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#10 0x00007fffef6aeadd in OMR::Compilation::compile() ()
   from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#11 0x00007fffef3a9cf7 in TR::CompilationInfoPerThreadBase::compile(J9VMThread*, TR::Compilation*, TR_ResolvedMethod*, TR_J9VMBase&, TR_OptimizationPlan*, TR::SegmentAllocator const&) ()
   from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
#12 0x00007fffef3aabb0 in TR::CompilationInfoPerThreadBase::wrappedCompile(J9PortLibrary*, void*) ()
   from /home/ilea/openj9/tmp/8594/jdk/lib/compressedrefs/libj9jit29.so
knn-k commented 4 years ago

Backtrace on aarch64 with debug information:

#4  TR::assertion (
    file=file@entry=0x7fb6b53d78 "/root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/il/OMRBlock.cpp", line=line@entry=277, condition=condition@entry=0x7fb6b540e0 "(frequency >= 0)",
    format=format@entry=0x7fb6b540b8 "Block frequency must be non negative\n")
    at /root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/infra/Assert.cpp:157
#5  0x0000007fb66e1b90 in OMR::Block::createEmptyBlock (n=n@entry=0x7f9339b220, comp=0x7f93000000, frequency=-1,
    block=block@entry=0x7f934b3110)
    at /root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/il/OMRBlock.cpp:277
#6  0x0000007fb688a290 in TR_LoopVersioner::versionNaturalLoop (this=this@entry=0x7f933d0020,
    whileLoop=whileLoop@entry=0x7f93429b50, nullCheckedReferences=0x7fb48e3cc8,
    nullCheckedReferences@entry=0x7fb671c2a0 <TR::AllBlockIterator::stepForward()+240>, nullCheckTrees=0x7fb48e3cd8,
    nullCheckTrees@entry=0x7fb6cdb000, boundCheckTrees=boundCheckTrees@entry=0x7fb48e3cf8,
    spineCheckTrees=0x7fb48e3d08, spineCheckTrees@entry=0x7f93017220,
    conditionalTrees=conditionalTrees@entry=0x7fb48e3d28, divCheckTrees=divCheckTrees@entry=0x7fb48e3d38,
    awrtbariTrees=0x7fb48e3d68, awrtbariTrees@entry=0x7fb48e6250, checkCastTrees=0x7fb48e3d48,
    checkCastTrees@entry=0x5a0000, arrayStoreCheckTrees=0x7fb48e3d58, arrayStoreCheckTrees@entry=0x7f93000000,
    specializedNodes=specializedNodes@entry=0x7fb48e3d78, invariantNodes=invariantNodes@entry=0x7fb48e3d88,
    invariantTranslationNodesList=0x7fb48e3d98, invariantTranslationNodesList@entry=0x7fb48e6278,
    innerWhileLoops=0x7fb48e3cb8, innerWhileLoops@entry=0x0, clonedInnerWhileLoops=0x7fb48e3dd8,
    clonedInnerWhileLoops@entry=0x7fb48e6308, skipVersioningAsynchk=false, skipVersioningAsynchk@entry=32,
    reverseBranchInLoops=...)
    at /root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/optimizer/LoopVersioner.cpp:3565
#7  0x0000007fb688ebd4 in TR_LoopVersioner::performWithoutDominators (this=0x7f933d0020)
    at /root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/optimizer/LoopVersioner.cpp:654
#8  0x0000007fb68a6040 in OMR::Optimizer::performOptimization (this=this@entry=0x7f930d0a00,
    optimization=optimization@entry=0x7fb6b80a18 <lastLoopVersionerOpts+16>, firstOptIndex=firstOptIndex@entry=0,
    lastOptIndex=lastOptIndex@entry=2147483647, doTiming=doTiming@entry=0)
    at /root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/optimizer/OMROptimizer.cpp:2061
#9  0x0000007fb68a6c2c in OMR::Optimizer::performOptimization (this=this@entry=0x7f930d0a00,
    optimization=optimization@entry=0x7fb6b210e0 <cheapWarmStrategyOpts+112>, firstOptIndex=firstOptIndex@entry=0,
    lastOptIndex=lastOptIndex@entry=2147483647, doTiming=doTiming@entry=0)
    at /root/openj9-openjdk-jdk11/build/linux-aarch64-normal-server-release/vm/compiler/../omr/compiler/optimizer/OMROptimizer.cpp:1607
knn-k commented 4 years ago

I still can recreate the assertion failure with x86_64 Linux and aarch64 today. What can I do with this?

knn-k commented 4 years ago

I still can recreate this assertion failure with a debug build on AArch64. (I just haven't tried x86-64 today).

hzongaro commented 1 year ago

Brad @BradleyWood, may I ask you to look at this problem?

knn-k commented 1 year ago

I reproduced this assertion failure on x86-64 Linux using sanity.functional last month. See #17616.