alibaba / arthas

Alibaba Java Diagnostic Tool Arthas/Alibaba Java诊断利器Arthas
https://arthas.aliyun.com/
Apache License 2.0
35.66k stars 7.49k forks source link

Elasticsearch 进程使用watch命令被卡死了大部分线程 #2938

Open cfangpp opened 5 days ago

cfangpp commented 5 days ago

环境信息

重现问题的步骤

  1. xxx
  2. xxx
  3. xxx

期望的结果

为什么?

实际运行的结果

实际运行结果,最好有详细的日志,异常栈。尽量贴文本。

[search][T#4]" #421 daemon prio=5 os_prio=0 tid=0x00007fe2b0101000 nid=0x179d waiting for monitor entry [0x00007fe0bd89a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
        - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
        at java.security.ProtectionDomain$2$1.get(ProtectionDomain.java:473)
        at sun.security.provider.PolicyFile.implies(PolicyFile.java:1080)
        at sun.security.provider.PolicySpiFile.engineImplies(PolicySpiFile.java:75)
        at java.security.Policy$PolicyDelegate.implies(Policy.java:780)
        at org.elasticsearch.bootstrap.ESPolicy.implies(ESPolicy.java:102)
        at java.security.ProtectionDomain.implies(ProtectionDomain.java:279)
        at java.security.AccessControlContext.checkPermission(AccessControlContext.java:450)
        at java.security.AccessController.checkPermission(AccessController.java:884)
        at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
        at java.lang.ClassLoader.checkClassLoaderPermission(ClassLoader.java:1528)
        at java.lang.Class.getClassLoader(Class.java:683)
        at com.taobao.arthas.core.advisor.SpyImpl.atEnter(SpyImpl.java:28)
        at java.arthas.SpyAPI.atEnter(SpyAPI.java:59)

为什么持锁线程进入BLOCKED?
"[search][T#20]" #458 daemon prio=5 os_prio=0 tid=0x00007fe30c10f000 nid=0x5167 waiting for monitor entry [0x00007fe263cfc000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
        - locked <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
        at java.security.ProtectionDomain$2$1.get(ProtectionDomain.java:473)
        at sun.security.provider.PolicyFile.implies(PolicyFile.java:1080)
        at sun.security.provider.PolicySpiFile.engineImplies(PolicySpiFile.java:75)
        at java.security.Policy$PolicyDelegate.implies(Policy.java:780)
        at org.elasticsearch.bootstrap.ESPolicy.implies(ESPolicy.java:102)
        at java.security.ProtectionDomain.implies(ProtectionDomain.java:279)
        at java.security.AccessControlContext.checkPermission(AccessControlContext.java:450)
        at java.security.AccessController.checkPermission(AccessController.java:884)
        at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
        at java.lang.ClassLoader.checkClassLoaderPermission(ClassLoader.java:1528)
        at java.lang.Class.getClassLoader(Class.java:683)
        at com.taobao.arthas.core.advisor.SpyImpl.atExit(SpyImpl.java:53)
        at java.arthas.SpyAPI.atExit(SpyAPI.java:64)

    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - locked <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
    - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
cfangpp commented 4 days ago

目标进程进入了死循环,流程如下:

  1. 首先arthas拦截了目标进程中java.security.Policy实现方法implies,这里目标进程是ES,实现类是org.elasticsearch.bootstrap.ESPolicy。
  2. 进入com.taobao.arthas.core.advisor.SpyImpl遇到clazz.getClassLoader(),该方法会进行java.lang.RuntimePermission "getClassLoader"权限校验。
  3. 调取目标进程中java.security.Policy.implies,重复进入arthas SpyImpl。
  4. 最后陷入死循环。

java.lang.StackOverflowError: null at java.security.ProtectionDomain.implies(ProtectionDomain.java:279) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:450) at java.security.AccessController.checkPermission(AccessController.java:884) at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) at java.lang.ClassLoader.checkClassLoaderPermission(ClassLoader.java:1528) at java.lang.Class.getClassLoader(Class.java:683) at com.taobao.arthas.core.advisor.SpyImpl.atEnter(SpyImpl.java:28) at java.arthas.SpyAPI.atEnter(SpyAPI.java:59) at org.elasticsearch.bootstrap.ESPolicy.implies(ESPolicy.java) at java.security.ProtectionDomain.implies(ProtectionDomain.java:279) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:450) at java.security.AccessController.checkPermission(AccessController.java:884) at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) at java.lang.ClassLoader.checkClassLoaderPermission(ClassLoader.java:1528) at java.lang.Class.getClassLoader(Class.java:683) at com.taobao.arthas.core.advisor.SpyImpl.atEnter(SpyImpl.java:28) at java.arthas.SpyAPI.atEnter(SpyAPI.java:59) at org.elasticsearch.bootstrap.ESPolicy.implies(ESPolicy.java)

cfangpp commented 4 days ago

解决办法,定制SecureSM的checkPermission方法,跳过检查

@Override
public void checkPermission(Permission perm) {
    // just for arthas
    if (perm instanceof RuntimePermission && "getClassLoader".equals(perm.getName())) {
        for (StackTraceElement element : Thread.currentThread().getStackTrace()) {
            if ("java.arthas.SpyAPI".equals(element.getClassName())) {
                return;
            }
        }
    }
    super.checkPermission(perm);
}