JetBrains / JetBrainsRuntime

Runtime environment based on OpenJDK for running IntelliJ Platform-based products on Windows, macOS, and Linux
GNU General Public License v2.0
1.36k stars 199 forks source link

[JBR11 + dcevm] when an agent retransform the start method of java.lang.Thread, the jvm crash #257

Closed cvictory closed 9 months ago

cvictory commented 1 year ago

the crash log as following:


Stack: [0x00007f2b301ab000,0x00007f2b302ac000],  sp=0x00007f2b302a6f50,  free space=1007k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xe85846]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x1c6
V  [libjvm.so+0xe8677f]  VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x2f
V  [libjvm.so+0x69a607]  report_vm_error(char const*, int, char const*, char const*, ...)+0xf7
V  [libjvm.so+0x2f2ffc]  MachSpillCopyNode::implementation(CodeBuffer*, PhaseRegAlloc*, bool, outputStream*) const [clone .constprop.82]+0x1dc
V  [libjvm.so+0x636dba]  Compile::scratch_emit_size(Node const*)+0x27a
V  [libjvm.so+0xbf0ff1]  Compile::shorten_branches(unsigned int*, int&, int&, int&)+0x231
V  [libjvm.so+0xbf1962]  Compile::init_buffer(unsigned int*)+0x1d2
V  [libjvm.so+0xbf8392]  Compile::Output()+0x382
V  [libjvm.so+0x640890]  Compile::Code_Gen()+0x4e0
V  [libjvm.so+0x644899]  Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x1069
V  [libjvm.so+0x5697d4]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0xd4
V  [libjvm.so+0x64e6fe]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe8e
V  [libjvm.so+0x64f01e]  CompileBroker::compiler_thread_loop()+0x42e
V  [libjvm.so+0xe24340]  JavaThread::thread_main_inner()+0xd0
V  [libjvm.so+0xe20429]  Thread::call_run()+0x149
V  [libjvm.so+0xbe2206]  thread_native_entry(Thread*)+0xe6
mkartashev commented 1 year ago

The stacktrace alone doesn't ring a bell. Can you provide a reproducer?

Also, JBR11 is quite old. Is the problem reproducible with 17 or 21?

AbsoluteZero-CHN commented 1 year ago

@mkartashev I encountered a similar issue where the JVM crashes when starting with opentelemetry-javaagent.jar attached with -XX:+AllowEnhancedClassRedefinition parameter enabled. Removing the parameter resolves the issue. Here's my crash log: replay_pid116783.log

I build my own JBR11 with the --with-debug-level=slowdebug parameter, and here's the crash log: hs_err_pid174734.log

I just tested it and the crash issue also occurs on JBR17 (version jbr_jcef-17.0.9-linux-x64-b1000.47). replay_pid175260.log

After debugging, I found that the issue was caused by agent retransforming well-known classes(java.* jdk.* sun.* etc), which led to an exception in the java_lang_Class assert is_instance method.

In my case (Java8 with dcevm patch), opentelemetry-javaagent.jar retransformed the java.lang.ClassLoader class, causing the classloader to lose all loaded classes. As a result, the native method findLoadedClass0 always return null(more details: https://github.com/dcevm/dcevm/issues/215). I ported the commit that fixes the retransform well-known classes issue and encountered this problem (see commit for more details: https://github.com/HotswapProjects/openjdk-jdk8u-dcevm/commit/654001b437481cf2a81813945f7fed86aaa35a2c).

Hope this information is helpful. Please fix this issue as soon as possible. I'll try to port it to the dcevm8 project😆.

Best regards!

mkartashev commented 1 year ago

@AbsoluteZero-CHN Thanks for the detailed explanation! I will pass on to @skybber FYI: the issue id in youtrack is JBR-6338, you can follow it there.

AbsoluteZero-CHN commented 1 year ago

@AbsoluteZero-CHN Thanks for the detailed explanation! I will pass on to @skybber FYI: the issue id in youtrack is JBR-6338, you can follow it there.

I understand that this may be a foolish question, but I would still like to inquire about when this issue might be fixed. It is crucial for me as I have limited time left for my project 😣.

mkartashev commented 1 year ago

I would still like to inquire about when this issue might be fixed.

That, of course, I cannot say at this point. What I can say with absolute certainty is the fix will come sooner if there's a simple reproducer for the bug. Ideally something like a script that, when executed, crashes the JVM in exactly this fashion.

AbsoluteZero-CHN commented 10 months ago

Although I acknowledge that this is not an ideal solution, I still want to share my approach here in the hope of providing some temporary workaround for those who are facing the same problem.

I have ported the code responsible for enhancing rt.jar from the opentelemetry-javaagent.jar to the openjdk source code (hard code), and then built a dedicated jdk for opentelemetry-javaagent.jar. Next, in hotswap-agent.jar, I enhanced the source code of opentelemetry-javaagent.jar (ensuring that hotswap-agent.jar starts before opentelemetry-javaagent.jar) to make opentelemetry-javaagent.jar ignore the classes in rt.jar. This way, opentelemetry-javaagent.jar does not need to enhance the classes in rt.jar, but can still make this part of the code effective.

Currently, this approach is working well for me. 😆

skybber commented 10 months ago

There is a jbr branch with the patch available at:

https://github.com/JetBrains/JetBrainsRuntime/tree/vladimir.dvorak/JBR-6363

if you know how to build a JDK, it would be great if you could try it on your case, but don't feel it's necessary

skybber commented 9 months ago

New jbr release https://github.com/JetBrains/JetBrainsRuntime/releases/tag/jbr-release-17.0.10b1186.1 support redefinition of java.lang.object, so this issue should be fixed now