eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

Failed to startup the Garbage Collector. When using optthruput, with special options. #16829

Open zheng-kai opened 1 year ago

zheng-kai commented 1 year ago

Java -version output

openjdk version "1.8.0_362" IBM Semeru Runtime Open Edition (build 1.8.0_362-b09) Eclipse OpenJ9 VM (build openj9-0.36.0, JRE 1.8.0 Windows 11 amd64-64-Bit Compressed References 20230207_599 (JIT enabled, AOT enabled) OpenJ9 - e68fb241f OMR - f491bbf6f JCL - eebde685ec based on jdk8u362-b09)

Summary of problem

I found a problem by accident. When I use optthruput GC with following options, jvm could not be created.

java  -Xgcpolicy:optthruput -Xgc:tlhIncrementSize=550000000 -Xgc:excessiveGCratio=80  -Xms512m -Xmx512m  -version

JVMJ9GC070E Failed to startup the Garbage Collector.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

But if I switch the GC, it's OK.

java  -Xgcpolicy:gencon -Xgc:tlhIncrementSize=550000000 -Xgc:excessiveGCratio=80  -Xms512m -Xmx512m  -version

openjdk version "1.8.0_362"
IBM Semeru Runtime Open Edition (build 1.8.0_362-b09)
Eclipse OpenJ9 VM (build openj9-0.36.0, JRE 1.8.0 Windows 11 amd64-64-Bit Compressed References 20230207_599 (JIT enabled, AOT enabled)
OpenJ9   - e68fb241f
OMR      - f491bbf6f
JCL      - eebde685ec based on jdk8u362-b09)

I would like to know why optthruput is the only one with exceptions. Thank you. And whether this information is sufficient to locate the problem?

dmitripivkine commented 1 year ago

I am able to reproduce the problem on xLinux with latest Java8. Thank you to let us know.

dmitripivkine commented 1 year ago

I understand failure scenario. The only problem with JVM itself is missing validation of entered value for -Xgc:tlhIncrementSize option. This option is used rarely, mostly for performance experiments.

So failure scenario is (with some simplifications): During GC initialization one of java threads reaches point where requests TLH increment. There is no validation for entered TLH increment size it is taken blindly. As a result size of requested TLH is much larger than maximum TLH size and take almost all memory in the heap for it. It means there is almost none left for other threads. And because this is Optthruput there is no other Space to try allocation. So, other threads struggle to allocate objects and call Garbage Collector. GC performes but with very little amount of memory to be freed. This operation repeats again and again until Excessive GC condition triggers OOM instead of next GC. This condition has reached before initialization of all GC Worker Threads is complete, so Failed to startup the Garbage Collector message is printed.

dmitripivkine commented 1 year ago

There is nothing wrong except missed input validation for rarely used option. Put it to Deep Backlog to be fixed in the future

zheng-kai commented 1 year ago

Thank you for your explanation, which has given me more understanding of GC.