alibaba / transmittable-thread-local

📌 a missing Java std lib(simple & 0-dependency) for framework/middleware, provide an enhanced InheritableThreadLocal that transmits values between threads even using thread pooling components.
https://github.com/alibaba/transmittable-thread-local
Apache License 2.0
7.59k stars 1.69k forks source link

TTL agent 与 其他agent的兼容性问题 #226

Open liauraljl opened 3 years ago

liauraljl commented 3 years ago

最近发现TTL agent与其他agent共用时,TTL agent必须放在最前面; 否则会导致jdk线程池修饰失败,TTL失去作用。

比如与Skywalking AgentPromethues等共用(未修改agent jar文件名)。

请问作者有遇到类似问题吗?

oldratlee commented 3 years ago

确实有其它的同学也反映过这样的问题。 👍 @liauraljl


你方便给一下极简的Demo @liauraljl 🙏 ♥️ :

以运行复现问题,方便我跟进排查,谢谢 😄


目前我觉得,可能是TTL Java Agent的设置问题,需要与不同Agent共用&生效:

https://github.com/alibaba/transmittable-thread-local/blob/b1aa750163c91a18607ed623848a546af1628a07/pom.xml#L268-L272

@liauraljl 欢迎一起排查解决 😄 ♥️

PS: Java Agent的资料

liauraljl commented 3 years ago
liauraljl commented 3 years ago

基本定位到问题了,TTL agent失效是:

liauraljl commented 3 years ago

我们目前的解决方案是

TTL agent放在最前位置,可以保证TTL agent 正常工作。 已在开发、测试环境得到验证。

不过还是希望有更好的解决方案, 毕竟人为控制 Agent加载顺序 的话后续长期迭代难免会出错。

TuitaKing commented 3 years ago

我觉得这个问题可能有两方面的原因:

于是我尝试

  1. SkywalkingAgent 中的日志全部切换为 System.out.println()
  2. 并且去掉org.apache.skywalking.apm.agent.core.boot.BootService 文件里的类。

做完这两步后,

PS: 我尝试的更多运行时查看的加载类信息:

我觉得算是可以佐证我上面的说法。


Java启动参数:

java -javaagent:.../ttl-agent-demo-main/src/main/agentDic/skywalking/skywalking-agent.jar  \
    -javaagent:.../ttl-agent-demo-main/src/main/agentDic/ttl/transmittable-thread-local-2.11.5.jar=ttl.agent.logger:STDOUT \
    -jar  ttl-agent-demo-1.0.0-SNAPSHOT.jar

运行输出日志:

2021-01-11 16:52:03.855 INFO [main] TtlAgent: [TtlAgent.premain] begin, agentArgs: ttl.agent.logger:STDOUT, Instrumentation: sun.instrument.InstrumentationImpl@47af7f3d
2021-01-11 16:52:03.873 INFO [main] TtlTransformer: [TtlTransformer] add Transformlet class com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.TtlExecutorTransformlet success
2021-01-11 16:52:03.874 INFO [main] TtlTransformer: [TtlTransformer] add Transformlet class com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.TtlForkJoinTransformlet success
2021-01-11 16:52:03.874 INFO [main] TtlTransformer: [TtlTransformer] add Transformlet class com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.TtlTimerTaskTransformlet success
2021-01-11 16:52:03.875 INFO [main] TtlAgent: [TtlAgent.premain] addTransformer class com.alibaba.ttl.threadpool.agent.TtlTransformer success
2021-01-11 16:52:03.875 INFO [main] TtlAgent: [TtlAgent.premain] end
  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v2.1.3.RELEASE)
2021-01-11 16:52:05.231  INFO 27164 --- [           main] c.l.d.t.TtlAgentDemoApplication          : Starting TtlAgentDemoApplication v1.0.0-SNAPSHOT on tuitaking.local with PID 27164 (/Users/tuitaking/source/ttl-agent-demo-main/target/ttl-agent-demo-1.0.0-SNAPSHOT.jar started by tuitaking in /Users/tuitaking/source/ttl-agent-demo-main/target)
2021-01-11 16:52:05.235  INFO 27164 --- [           main] c.l.d.t.TtlAgentDemoApplication          : No active profile set, falling back to default profiles: default
2021-01-11 16:52:06.750  INFO 27164 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8080 (http)
2021-01-11 16:52:06.752 INFO [main] TtlExecutorTransformlet: insert code before method public ScheduledFuture schedule(Runnable, long, TimeUnit) of class java.util.concurrent.ScheduledThreadPoolExecutor: $1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);
2021-01-11 16:52:06.769 INFO [main] TtlExecutorTransformlet: insert code before method public ScheduledFuture schedule(Callable, long, TimeUnit) of class java.util.concurrent.ScheduledThreadPoolExecutor: $1 = com.alibaba.ttl.TtlCallable.get($1, false, true);
com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);
2021-01-11 16:52:06.771 INFO [main] TtlExecutorTransformlet: insert code before method public ScheduledFuture scheduleAtFixedRate(Runnable, long, long, TimeUnit) of class java.util.concurrent.ScheduledThreadPoolExecutor: $1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);
2021-01-11 16:52:06.773 INFO [main] TtlExecutorTransformlet: insert code before method public ScheduledFuture scheduleWithFixedDelay(Runnable, long, long, TimeUnit) of class java.util.concurrent.ScheduledThreadPoolExecutor: $1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
......
oldratlee commented 3 years ago

按 @liauraljl 提供的 『demo:https://github.com/liauraljl/ttl-agent-demo 』 走了一遍,确实如你的说的:👍

TTL agent放在最前位置(如位于Skywalking Agent之前),TTL Agent的生效的。 否则Skywalking AgentTTL agent之前,线程池未被增强,导致TTL Agent没有生效。

感谢 @liauraljl 的完整Demo、复现问题与排查~~ ♥️


原因 如 上面 @TuitaKing 的说明,很完整 👍 :

因为Skywalking Agent的入口逻辑(premain)包含 线程池 的启动,导致

感谢 @TuitaKing 的详细的分析与排查~~ ♥️


Quote from @liauraljl

我们目前的解决方案是

TTL agent放在最前位置,可以保证TTL agent 正常工作。 已在开发、测试环境得到验证。

COOL!

oldratlee commented 3 years ago

Quote from @liauraljl

不过还是希望有更好的解决方案, 毕竟人为控制 Agent加载顺序 的话后续长期迭代难免会出错。

我试着解决 Agent加载顺序 的影响:

使用这个TTL分支的TTL Jar,运行 @liauraljl 的 ttl-agent-demo

除了,上面执行retransform碰到的异常,通过运行还可以看到:

  1. 需要retransform大量已经加载的类,这个过程会比较费时;即会带来性能的影响。
  2. retransform已加载的类』的引入,让Agent的操作流程及其实现会复杂不少;这些复杂性会有线上实现稳定性的影响。

目前个人的判断

尤其是考虑到线上实现的稳定性,目前个人的判断是

欢迎 @liauraljl 、 @TuitaKing 还有大家,进行更多实现或方案选择的讨论~ ♥️


Java启动参数:

java -javaagent:src/main/agentdic/skywalking/skywalking-agent.jar \
     -javaagent:path/to/transmittable-thread-local-2.13.0-SNAPSHOT.jar=ttl.agent.logger:stdout \
     -jar target/ttl-agent-demo-1.0.0-SNAPSHOT.jar

运行输出日志:

......
2021-01-18 00:26:13.408 INFO [main] TtlAgent: retransform class: class java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue
2021-01-18 00:26:13.422 INFO [main] TtlAgent: retransform class: class java.util.concurrent.ThreadPoolExecutor$AbortPolicy
2021-01-18 00:26:13.434 INFO [main] TtlAgent: retransform class: class java.util.concurrent.ScheduledThreadPoolExecutor
2021-01-18 00:26:13.438 INFO [main] TtlExecutorTransformlet: insert code before method public ScheduledFuture scheduleAtFixedRate(Runnable, long, long, TimeUnit) of class java.util.concurrent.ScheduledThreadPoolExecutor:
    $1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
    com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);
......
2021-01-18 00:26:13.460 INFO [main] TtlExecutorTransformlet: insert code before method public void execute(Runnable) of class java.util.concurrent.ScheduledThreadPoolExecutor:
    $1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
    com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);
2021-01-18 00:26:13.462 SEVERE [main] TtlAgent: Fail to load TtlAgent , cause: java.lang.UnsupportedOperationException: class redefinition failed: attempted to change superclass or interfaces
java.lang.UnsupportedOperationException: class redefinition failed: attempted to change superclass or interfaces
        at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method)
        at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:167)
        at com.alibaba.ttl.threadpool.agent.TtlAgent.premain(Unknown Source)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:513)
        at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:525)
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:513)
        at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:525)
Caused by: java.lang.IllegalStateException: Fail to load TtlAgent , cause: java.lang.UnsupportedOperationException: class redefinition failed: attempted to change superclass or interfaces
        at com.alibaba.ttl.threadpool.agent.TtlAgent.premain(Unknown Source)
        ... 6 more
Caused by: java.lang.UnsupportedOperationException: class redefinition failed: attempted to change superclass or interfaces
        at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method)
        at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:167)
        at com.alibaba.ttl.threadpool.agent.TtlAgent.premain(Unknown Source)
        ... 6 more
*** java.lang.instrument ASSERTION FAILED ***: "result" with message agent load/premain call failed at ./src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 422
FATAL ERROR in native method: processing of -javaagent failed, processJavaStart failed
[1]    68385 abort      java -javaagent:src/main/agentdic/skywalking/skywalking-agent.jar  -jar
oldratlee commented 3 years ago

@liauraljl @TuitaKing README文档中 增加一个FAQ:

1. TTL Agent与其它Agent(如SkywalkingPromethues)配合使用时不生效?

https://github.com/alibaba/transmittable-thread-local#-faq

TuitaKing commented 3 years ago

@oldratlee 这个错(class redefinition failed: attempted to change superclass or interfaces) 是因为RetransformClasses方法的限制:不能增加、删除、和修改方法签名,或者修饰符,还有继承关系。

debugopenjdk排查了下,在~source/openjdk-jdk8u/hotspot/src/share/vm/prims/jvmtiRedefineClasses.cpp中会有一个compare_and_normalize_class_versions 方法中的n_intfs != k_new_interfaces->length() 判断抛出该异常,即修改前后的接口数量不一致,违反了上面不能修改继承关系的规则。

具体原因应该是TtlAgent中让每个被包装的线程池都实现了TtlAgentEnhanced

这个是用来避免父类再被包装?我去掉这个实现该接口的逻辑,发现是可以做到的。 但是仅仅只是用来测试和学习。后面是arthas反编译后的部分代码,还有启动的日志。


java
-javaagent:/Users/tuitaking/source/ttl-agent-demo-main/src/main/agentDic/skywalking/skywalking-agent.jar
-javaagent:/Users/tuitaking/source/transmittable-thread-local/target/transmittable-thread-local-2.13.0-SNAPSHOT.jar=ttl.agent.logger:STDOUT
-jar /Users/tuitaking/source/ttl-agent-demo-main/target/ttl-agent-demo-1.0.0-SNAPSHOT.jar

反编译:

public class ScheduledThreadPoolExecutor extends ThreadPoolExecutor implements ScheduledExecutorService {
    private volatile boolean continueExistingPeriodicTasksAfterShutdown;
    private volatile boolean executeExistingDelayedTasksAfterShutdown = true;
    private volatile boolean removeOnCancel = false;
    private static final AtomicLong sequencer = new AtomicLong();

    public ScheduledThreadPoolExecutor(int corePoolSize) {
        super(corePoolSize, 2147483647, 0L, TimeUnit.NANOSECONDS, new java.util.concurrent.ScheduledThreadPoolExecutor.DelayedWorkQueue());
    }

    public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) {
        super(corePoolSize, 2147483647, 0L, TimeUnit.NANOSECONDS, new java.util.concurrent.ScheduledThreadPoolExecutor.DelayedWorkQueue(), threadFactory);
    }

    public ScheduledThreadPoolExecutor(int corePoolSize, RejectedExecutionHandler handler) {
        super(corePoolSize, 2147483647, 0L, TimeUnit.NANOSECONDS, new java.util.concurrent.ScheduledThreadPoolExecutor.DelayedWorkQueue(), handler);
    }

    public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory, RejectedExecutionHandler handler) {
        super(corePoolSize, 2147483647, 0L, TimeUnit.NANOSECONDS, new java.util.concurrent.ScheduledThreadPoolExecutor.DelayedWorkQueue(), threadFactory, handler);
    }

    public void shutdown() {
        super.shutdown();
    }

    public void execute(Runnable command) {
        Runnable command = TtlRunnable.get(command, false, true);
        Utils.setAutoWrapperAttachment(command);
        this.schedule((Runnable)command, 0L, TimeUnit.NANOSECONDS);
    }
.......

日志


DEBUG 2021-02-02 18:14:15:209 main AgentPackagePath : The beacon class location is jar:file:/Users/tuitaking/source/ttl-agent-demo-main/src/main/agentDic/skywalking/skywalking-agent.jar!/org/apache/skywalking/apm/agent/core/boot/AgentPackagePath.class. 
2021-02-02 18:14:16.232 INFO [main] TtlAgent: [TtlAgent.premain] begin, agentArgs: ttl.agent.logger:STDOUT, Instrumentation: sun.instrument.InstrumentationImpl@776aec5c
2021-02-02 18:14:16.290 INFO [main] TtlTransformer: [TtlTransformer] add Transformlet class com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.TtlExecutorTransformlet success
2021-02-02 18:14:16.291 INFO [main] TtlAgent: [TtlAgent.premain] addTransformer class com.alibaba.ttl.threadpool.agent.TtlTransformer success
2021-02-02 18:14:16.292 INFO [main] TtlAgent: [TtlAgent.premain] end
2021-02-02 18:14:16.324 INFO [main] TtlAgent: retransform class: class java.util.concurrent.ThreadPoolExecutor
2021-02-02 18:14:16.361 INFO [main] TtlExecutorTransformlet: insert code before method public void execute(Runnable) of class java.util.concurrent.ThreadPoolExecutor:
    $1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
    com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v2.1.3.RELEASE)

2021-02-02 18:14:21.281  INFO 70498 --- [           main] c.l.d.t.TtlAgentDemoApplication          : Starting TtlAgentDemoApplication v1.0.0-SNAPSHOT on tuitaking.local with PID 70498 (/Users/tuitaking/source/ttl-agent-demo-main/target/ttl-agent-demo-1.0.0-SNAPSHOT.jar started by tuitaking in /Users/tuitaking/source/openjdk-jdk8u/build/macosx-x86_64-normal-server-slowdebug/jdk/bin)
2021-02-02 18:14:21.309  INFO 70498 --- [           main] c.l.d.t.TtlAgentDemoApplication          : No active profile set, falling back to default profiles: default
2021-02-02 18:14:32.302  INFO 70498 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8080 (http)
2021-02-02 18:14:32.527  INFO 70498 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2021-02-02 18:14:32.528  INFO 70498 --- [           main] org.apache.catalina.core.StandardEngine  : Starting Servlet engine: [Apache Tomcat/9.0.16]
2021-02-02 18:14:32.616  INFO 70498 --- [           main] o.a.catalina.core.AprLifecycleListener   : The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: [/Users/tuitaking/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.]
2021-02-02 18:14:33.309  INFO 70498 --- [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2021-02-02 18:14:33.310  INFO 70498 --- [           main] o.s.web.context.ContextLoader            : Root WebApplicationContext: initialization completed in 11551 ms
2021-02-02 18:14:35.244  INFO 70498 --- [           main] o.s.s.concurrent.ThreadPoolTaskExecutor  : Initializing ExecutorService 'applicationTaskExecutor'
2021-02-02 18:14:37.218  INFO 70498 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path ''
2021-02-02 18:14:37.250  INFO 70498 --- [           main] c.l.d.t.TtlAgentDemoApplication          : Started TtlAgentDemoApplication in 18.986 seconds (JVM running for 23.442)
2021-02-02 18:15:02.917 INFO [arthas-command-execute] TtlExecutorTransformlet: insert code before method public void execute(Runnable) of class java.util.concurrent.ScheduledThreadPoolExecutor:
    $1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
    com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);
oldratlee commented 3 years ago

这个错(class redefinition failed: attempted to change superclass or interfaces) 是因为RetransformClasses方法的限制:不能增加、删除、和修改方法签名,或者修饰符,还有继承关系。

具体原因应该是TtlAgent中让每个被包装的线程池都实现了TtlAgentEnhanced。 我去掉这个实现该接口的逻辑,发现是可以做到的。

COOL! 👍 ♥️ @TuitaKing

这个是用来避免父类再被包装?

用来标识,对于ThreadPoolExecutor,目前并不是一定需要。

但对于ForkJoinPool 会有加类字段 操作。 修改类在ForkJoinTtlTransformlet

oldratlee commented 3 years ago

@TuitaKing 可以看到你非常了解Java Agent 与 TTL的整个过程与实现。 👍 🚀

你有没有兴趣 方便来实验 与 实现一下:TTL在类加载后的增强功能? @TuitaKing 😁 ❤️ (ForkJoinPool 的实现方案可能有些复杂需要挖)

PS:最近也在加 TTL的扩展点功能(Extension Transformlet),刚发了v2.13.0-Beta1; 最新的代码还没有完全稳定,也带来了一些实现过程的额外麻烦。🙏

TuitaKing commented 3 years ago

我可以尝试下😁。不过我还处于学习状态,可能花的时间比较久,尽力而为🙏。

TuitaKing commented 3 years ago

我尝试了几次修改已经加载的class。

但是RetransformClasses的限制又点多,会去掉TTL比如都实现相同的接口这类设计。 所以我觉得暂时只能使用规范(TTL在前)来避开上面的问题。

后续我会使用bytebuddy进行尝试,但是和现在的设计区别较大。

望知晓。有点抱歉

oldratlee commented 3 years ago

我尝试了几次修改已经加载的class。

但是RetransformClasses的限制又点多,会去掉TTL比如都实现相同的接口这类设计。 所以我觉得暂时只能使用规范(TTL在前)来避开上面的问题。

后续我会使用bytebuddy进行尝试,但是和现在的设计区别较大。

望知晓。有点抱歉

收到。辛苦了~ ♥️ @TuitaKing

期待ing~