eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

Javacore generation with kill -3 freezes JVM starting from Java 8.0.7.10 #15566

Open yathamravali opened 2 years ago

yathamravali commented 2 years ago

Java -version output

Output from java -version.

java version "1.8.0_331"
Java(TM) SE Runtime Environment (build 8.0.7.10 - pxa6480sr7fp10-20220505_01(SR7 FP10))
IBM J9 VM (build 2.9, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20220427_27745 (JIT enabled, AOT enabled)
OpenJ9   - b15041a
OMR      - 3671a9f
IBM      - 1b0232b)
JCL - 20220504_01 based on Oracle jdk8u331-b09

Summary of problem

Javacore generation with kill -3 completely freezes JVM on 8.0.7.10 onwards for process with high number of threads

The HMC Next product moved from 8.0.7.6 to 8.0.7.10 and this problem started. In the product test environment, every time a javacore is generated using kill -3, the Java process completely hangs, and absolutely no response or movement within the JVM.

The change which introduced the issue is : https://github.com/eclipse/omr/pull/6345

Next steps

As the Issue is on high priority and as the hang occurs during javacore generation which is the basic functionality we expect getting more similar issues from other customers as well starting from Java 8.0.7.10.

@tajila @pshipton Could you please consider providing a command line option so that the resolving of function name can be disabled/enabled?

From service perspective, this new option is needed because :

  1. Resolving the function names at least are time consuming and could be problematic (just like this current issue).
  2. Most customers do not care about the function names in the native stack trace
  3. We only care about the native stack trace function names to resolve crash issues and hangs in native stack traces.
  4. We have the nativeDecoder to get the information from the generated javacore file or native stderr messages.
  5. We can use gdb/gcore to obtain the information if necessary.
  6. We do not have this information in Javacore files for so many years. 

Thanks in Advance for all your help on this issue!

manqingl commented 2 years ago

@keithc-ca @pshipton @tajila : please help

keithc-ca commented 2 years ago

I am investigating.

keithc-ca commented 2 years ago

Less work is done by default when producing java dumps as a result of these three pull requests:

There is still the potential for a hang, but the number of threads would have to be significantly higher. I intend to: