eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

Segfaults in JITServer-compiled code #15146

Open AlexeyKhrabrov opened 2 years ago

AlexeyKhrabrov commented 2 years ago

Using JITServer with the -Xjit:disableDelayRelocationForAOTCompilations option can rarely (less than 1 in 100 runs) lead to segfaults in JIT-compiled code.

The issue is reproducible with AcmeAir and (less frequently) DayTrader7. Affected methods (extracted from stack traces in javacore files) include java/nio/DirectByteBuffer.put([BII)Ljava/nio/ByteBuffer; and com/ibm/ws/bytebuffer/internal/WsByteBufferImpl.copyToDirectBuffer()V (which seems to inline the first one); there might be others. The segfault happens more frequently without AOT cache.

-Xjit:disableDelayRelocationForAOTCompilations is an undocumented option that is not commonly used and not covered by the tests. It is enabled by default with JITServer AOT cache because it results in better ramp-up performance.

AlexeyKhrabrov commented 2 years ago

One possible simple workaround fix for this issue is to always delay relocation of remote AOT methods until at least the next invocation of the method. This is how relocation is already handled for JITServer AOT cache methods that use SVM (for a different reason), which has negligible impact on performance. PR #15148 implements this fix.

AlexeyKhrabrov commented 2 years ago

@mpirvu FYI

AlexeyKhrabrov commented 2 years ago

After more testing, it turns out that the issue is reproducible without disableDelayRelocation as well. The segfault also happens with the Spring PetClinic benchmark, although even less frequently than with AcmeAir. For PetClinic, the affected methods (top of the Java call stack of crashing threads in the javacore) also include sun/nio/ch/Util.getTemporaryDirectBuffer and sun/nio/ch/IOUtil.readIntoNativeBuffer, which are also ByteBuffer-related.

I haven't been able to reproduce the issue with the 0.32.0 release. It has probably been introduced since then.