Open liauraljl opened 3 years ago
确实有其它的同学也反映过这样的问题。 👍 @liauraljl
你方便给一下极简的Demo @liauraljl 🙏 ♥️ :
SkywalkingAgent
、Promethues
等的Agent 共用 以运行复现问题,方便我跟进排查,谢谢 😄
目前我觉得,可能是TTL Java Agent
的设置问题,需要与不同Agent
共用&生效:
@liauraljl 欢迎一起排查解决 😄 ♥️
PS: Java Agent
的资料
基本定位到问题了,TTL agent
失效是:
TTL agent
之前使用其他agent
,导致ThreadPoolExecutor
等类在TtlAgent premain
执行之前被加载
把TTL agent
放在最前位置,可以保证TTL agent
正常工作。
已在开发、测试环境得到验证。
不过还是希望有更好的解决方案,
毕竟人为控制 Agent
加载顺序 的话后续长期迭代难免会出错。
我觉得这个问题可能有两方面的原因:
SkywalkingAgent
中有ILog
这个类,他的初始化是在TTl agent
之前;
如果使用了线程池来打印日志,可能会出现还没有执行TTL
中的premain
的时候Runnable
就已经被加载进去了。
导致不成功。SkywalkingAgent
中org.apache.skywalking.apm.agent.core.jvm
这个包下面包含了很多线程池,
这些线程池是initialDelay=0
的,他的初始化也是优先于TtlAgent
的。SkywalkingAgent
中的日志全部切换为 System.out.println()
org.apache.skywalking.apm.agent.core.boot.BootService
文件里的类。做完这两步后,
arthas
查看加载到jvm
的类的时候能够看到TtlRunnable
被顺利加载进去了(开始是看不到TtlRunnable
这个类的);PS: 我尝试的更多运行时查看的加载类信息:
TtlAgent
的premain
方法下面输出所有的已经加载的类(Class[] classes = inst.getAllLoadedClasses();
)TtlAgent
放在前面,classes
中是不包含线程池或者skywalking
的类的TtlAgent
放在后面,则会输出线程池和一些skywalking
的类。我觉得算是可以佐证我上面的说法。
Java启动参数:
java -javaagent:.../ttl-agent-demo-main/src/main/agentDic/skywalking/skywalking-agent.jar \
-javaagent:.../ttl-agent-demo-main/src/main/agentDic/ttl/transmittable-thread-local-2.11.5.jar=ttl.agent.logger:STDOUT \
-jar ttl-agent-demo-1.0.0-SNAPSHOT.jar
运行输出日志:
2021-01-11 16:52:03.855 INFO [main] TtlAgent: [TtlAgent.premain] begin, agentArgs: ttl.agent.logger:STDOUT, Instrumentation: sun.instrument.InstrumentationImpl@47af7f3d
2021-01-11 16:52:03.873 INFO [main] TtlTransformer: [TtlTransformer] add Transformlet class com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.TtlExecutorTransformlet success
2021-01-11 16:52:03.874 INFO [main] TtlTransformer: [TtlTransformer] add Transformlet class com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.TtlForkJoinTransformlet success
2021-01-11 16:52:03.874 INFO [main] TtlTransformer: [TtlTransformer] add Transformlet class com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.TtlTimerTaskTransformlet success
2021-01-11 16:52:03.875 INFO [main] TtlAgent: [TtlAgent.premain] addTransformer class com.alibaba.ttl.threadpool.agent.TtlTransformer success
2021-01-11 16:52:03.875 INFO [main] TtlAgent: [TtlAgent.premain] end
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v2.1.3.RELEASE)
2021-01-11 16:52:05.231 INFO 27164 --- [ main] c.l.d.t.TtlAgentDemoApplication : Starting TtlAgentDemoApplication v1.0.0-SNAPSHOT on tuitaking.local with PID 27164 (/Users/tuitaking/source/ttl-agent-demo-main/target/ttl-agent-demo-1.0.0-SNAPSHOT.jar started by tuitaking in /Users/tuitaking/source/ttl-agent-demo-main/target)
2021-01-11 16:52:05.235 INFO 27164 --- [ main] c.l.d.t.TtlAgentDemoApplication : No active profile set, falling back to default profiles: default
2021-01-11 16:52:06.750 INFO 27164 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http)
2021-01-11 16:52:06.752 INFO [main] TtlExecutorTransformlet: insert code before method public ScheduledFuture schedule(Runnable, long, TimeUnit) of class java.util.concurrent.ScheduledThreadPoolExecutor: $1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);
2021-01-11 16:52:06.769 INFO [main] TtlExecutorTransformlet: insert code before method public ScheduledFuture schedule(Callable, long, TimeUnit) of class java.util.concurrent.ScheduledThreadPoolExecutor: $1 = com.alibaba.ttl.TtlCallable.get($1, false, true);
com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);
2021-01-11 16:52:06.771 INFO [main] TtlExecutorTransformlet: insert code before method public ScheduledFuture scheduleAtFixedRate(Runnable, long, long, TimeUnit) of class java.util.concurrent.ScheduledThreadPoolExecutor: $1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);
2021-01-11 16:52:06.773 INFO [main] TtlExecutorTransformlet: insert code before method public ScheduledFuture scheduleWithFixedDelay(Runnable, long, long, TimeUnit) of class java.util.concurrent.ScheduledThreadPoolExecutor: $1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
......
按 @liauraljl 提供的 『demo:https://github.com/liauraljl/ttl-agent-demo 』 走了一遍,确实如你的说的:👍
把TTL agent
放在最前位置(如位于Skywalking Agent
之前),TTL Agent
的生效的。
否则Skywalking Agent
在TTL agent
之前,线程池未被增强,导致TTL Agent
没有生效。
感谢 @liauraljl 的完整Demo、复现问题与排查~~ ♥️
原因 如 上面 @TuitaKing 的说明,很完整 👍 :
因为Skywalking Agent
的入口逻辑(premain
)包含 线程池 的启动,导致
TTL Agent
(的premain
)时,TTL
需要加强的线程池类已经加载(load
)了。TTL Agent
的TtlTransformer
是在类加载时触发类的增强;如果类已经加载了会跳过TTL Agent
的增强逻辑。感谢 @TuitaKing 的详细的分析与排查~~ ♥️
Quote from @liauraljl
我们目前的解决方案是
把
TTL agent
放在最前位置,可以保证TTL agent
正常工作。 已在开发、测试环境得到验证。
COOL!
Quote from @liauraljl
不过还是希望有更好的解决方案, 毕竟人为控制
Agent
加载顺序 的话后续长期迭代难免会出错。
我试着解决 Agent
加载顺序 的影响:
expt/fix-agent-order-problem
TtlAgent
时,retransform
已加载的类:https://github.com/alibaba/transmittable-thread-local/commit/4886b7e9fb1a97f6711718f85ce16d5951ccc15e#diff-7baf52d90b056f08a33f25f27d6df84ecd50797196db42bc39b984f57ccceb2eR147-R156使用这个TTL
分支的TTL Jar
,运行 @liauraljl 的 ttl-agent-demo
:
java.lang.UnsupportedOperationException: class redefinition failed: attempted to change superclass or interfaces
retransform
已经加载的类ScheduledThreadPoolExecutor
除了,上面执行retransform
碰到的异常,通过运行还可以看到:
retransform
大量已经加载的类,这个过程会比较费时;即会带来性能的影响。retransform
已加载的类』的引入,让Agent
的操作流程及其实现会复杂不少;这些复杂性会有线上实现稳定性的影响。尤其是考虑到线上实现的稳定性,目前个人的判断是
我们目前的解决方案是
把
TTL agent
放在最前位置,可以保证TTL agent
正常工作。 已在开发、测试环境得到验证。
retransform
已加载的类』的实现方式Java启动参数:
java -javaagent:src/main/agentdic/skywalking/skywalking-agent.jar \
-javaagent:path/to/transmittable-thread-local-2.13.0-SNAPSHOT.jar=ttl.agent.logger:stdout \
-jar target/ttl-agent-demo-1.0.0-SNAPSHOT.jar
运行输出日志:
......
2021-01-18 00:26:13.408 INFO [main] TtlAgent: retransform class: class java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue
2021-01-18 00:26:13.422 INFO [main] TtlAgent: retransform class: class java.util.concurrent.ThreadPoolExecutor$AbortPolicy
2021-01-18 00:26:13.434 INFO [main] TtlAgent: retransform class: class java.util.concurrent.ScheduledThreadPoolExecutor
2021-01-18 00:26:13.438 INFO [main] TtlExecutorTransformlet: insert code before method public ScheduledFuture scheduleAtFixedRate(Runnable, long, long, TimeUnit) of class java.util.concurrent.ScheduledThreadPoolExecutor:
$1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);
......
2021-01-18 00:26:13.460 INFO [main] TtlExecutorTransformlet: insert code before method public void execute(Runnable) of class java.util.concurrent.ScheduledThreadPoolExecutor:
$1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);
2021-01-18 00:26:13.462 SEVERE [main] TtlAgent: Fail to load TtlAgent , cause: java.lang.UnsupportedOperationException: class redefinition failed: attempted to change superclass or interfaces
java.lang.UnsupportedOperationException: class redefinition failed: attempted to change superclass or interfaces
at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method)
at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:167)
at com.alibaba.ttl.threadpool.agent.TtlAgent.premain(Unknown Source)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:513)
at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:525)
Exception in thread "main" java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:513)
at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:525)
Caused by: java.lang.IllegalStateException: Fail to load TtlAgent , cause: java.lang.UnsupportedOperationException: class redefinition failed: attempted to change superclass or interfaces
at com.alibaba.ttl.threadpool.agent.TtlAgent.premain(Unknown Source)
... 6 more
Caused by: java.lang.UnsupportedOperationException: class redefinition failed: attempted to change superclass or interfaces
at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method)
at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:167)
at com.alibaba.ttl.threadpool.agent.TtlAgent.premain(Unknown Source)
... 6 more
*** java.lang.instrument ASSERTION FAILED ***: "result" with message agent load/premain call failed at ./src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 422
FATAL ERROR in native method: processing of -javaagent failed, processJavaStart failed
[1] 68385 abort java -javaagent:src/main/agentdic/skywalking/skywalking-agent.jar -jar
@liauraljl @TuitaKing README文档中 增加一个FAQ:
1.
TTL Agent
与其它Agent
(如Skywalking
、Promethues
)配合使用时不生效?
@oldratlee 这个错(class redefinition failed: attempted to change superclass or interfaces
)
是因为RetransformClasses
方法的限制:不能增加、删除、和修改方法签名,或者修饰符,还有继承关系。
我debug
了openjdk
排查了下,在~source/openjdk-jdk8u/hotspot/src/share/vm/prims/jvmtiRedefineClasses.cpp
中会有一个compare_and_normalize_class_versions
方法中的n_intfs != k_new_interfaces->length()
判断抛出该异常,即修改前后的接口数量不一致,违反了上面不能修改继承关系的规则。
具体原因应该是TtlAgent
中让每个被包装的线程池都实现了TtlAgentEnhanced
。
这个是用来避免父类再被包装?我去掉这个实现该接口的逻辑,发现是可以做到的。
但是仅仅只是用来测试和学习。后面是arthas
反编译后的部分代码,还有启动的日志。
java
-javaagent:/Users/tuitaking/source/ttl-agent-demo-main/src/main/agentDic/skywalking/skywalking-agent.jar
-javaagent:/Users/tuitaking/source/transmittable-thread-local/target/transmittable-thread-local-2.13.0-SNAPSHOT.jar=ttl.agent.logger:STDOUT
-jar /Users/tuitaking/source/ttl-agent-demo-main/target/ttl-agent-demo-1.0.0-SNAPSHOT.jar
反编译:
public class ScheduledThreadPoolExecutor extends ThreadPoolExecutor implements ScheduledExecutorService {
private volatile boolean continueExistingPeriodicTasksAfterShutdown;
private volatile boolean executeExistingDelayedTasksAfterShutdown = true;
private volatile boolean removeOnCancel = false;
private static final AtomicLong sequencer = new AtomicLong();
public ScheduledThreadPoolExecutor(int corePoolSize) {
super(corePoolSize, 2147483647, 0L, TimeUnit.NANOSECONDS, new java.util.concurrent.ScheduledThreadPoolExecutor.DelayedWorkQueue());
}
public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) {
super(corePoolSize, 2147483647, 0L, TimeUnit.NANOSECONDS, new java.util.concurrent.ScheduledThreadPoolExecutor.DelayedWorkQueue(), threadFactory);
}
public ScheduledThreadPoolExecutor(int corePoolSize, RejectedExecutionHandler handler) {
super(corePoolSize, 2147483647, 0L, TimeUnit.NANOSECONDS, new java.util.concurrent.ScheduledThreadPoolExecutor.DelayedWorkQueue(), handler);
}
public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory, RejectedExecutionHandler handler) {
super(corePoolSize, 2147483647, 0L, TimeUnit.NANOSECONDS, new java.util.concurrent.ScheduledThreadPoolExecutor.DelayedWorkQueue(), threadFactory, handler);
}
public void shutdown() {
super.shutdown();
}
public void execute(Runnable command) {
Runnable command = TtlRunnable.get(command, false, true);
Utils.setAutoWrapperAttachment(command);
this.schedule((Runnable)command, 0L, TimeUnit.NANOSECONDS);
}
.......
日志
DEBUG 2021-02-02 18:14:15:209 main AgentPackagePath : The beacon class location is jar:file:/Users/tuitaking/source/ttl-agent-demo-main/src/main/agentDic/skywalking/skywalking-agent.jar!/org/apache/skywalking/apm/agent/core/boot/AgentPackagePath.class.
2021-02-02 18:14:16.232 INFO [main] TtlAgent: [TtlAgent.premain] begin, agentArgs: ttl.agent.logger:STDOUT, Instrumentation: sun.instrument.InstrumentationImpl@776aec5c
2021-02-02 18:14:16.290 INFO [main] TtlTransformer: [TtlTransformer] add Transformlet class com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.TtlExecutorTransformlet success
2021-02-02 18:14:16.291 INFO [main] TtlAgent: [TtlAgent.premain] addTransformer class com.alibaba.ttl.threadpool.agent.TtlTransformer success
2021-02-02 18:14:16.292 INFO [main] TtlAgent: [TtlAgent.premain] end
2021-02-02 18:14:16.324 INFO [main] TtlAgent: retransform class: class java.util.concurrent.ThreadPoolExecutor
2021-02-02 18:14:16.361 INFO [main] TtlExecutorTransformlet: insert code before method public void execute(Runnable) of class java.util.concurrent.ThreadPoolExecutor:
$1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v2.1.3.RELEASE)
2021-02-02 18:14:21.281 INFO 70498 --- [ main] c.l.d.t.TtlAgentDemoApplication : Starting TtlAgentDemoApplication v1.0.0-SNAPSHOT on tuitaking.local with PID 70498 (/Users/tuitaking/source/ttl-agent-demo-main/target/ttl-agent-demo-1.0.0-SNAPSHOT.jar started by tuitaking in /Users/tuitaking/source/openjdk-jdk8u/build/macosx-x86_64-normal-server-slowdebug/jdk/bin)
2021-02-02 18:14:21.309 INFO 70498 --- [ main] c.l.d.t.TtlAgentDemoApplication : No active profile set, falling back to default profiles: default
2021-02-02 18:14:32.302 INFO 70498 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http)
2021-02-02 18:14:32.527 INFO 70498 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2021-02-02 18:14:32.528 INFO 70498 --- [ main] org.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/9.0.16]
2021-02-02 18:14:32.616 INFO 70498 --- [ main] o.a.catalina.core.AprLifecycleListener : The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: [/Users/tuitaking/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.]
2021-02-02 18:14:33.309 INFO 70498 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2021-02-02 18:14:33.310 INFO 70498 --- [ main] o.s.web.context.ContextLoader : Root WebApplicationContext: initialization completed in 11551 ms
2021-02-02 18:14:35.244 INFO 70498 --- [ main] o.s.s.concurrent.ThreadPoolTaskExecutor : Initializing ExecutorService 'applicationTaskExecutor'
2021-02-02 18:14:37.218 INFO 70498 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path ''
2021-02-02 18:14:37.250 INFO 70498 --- [ main] c.l.d.t.TtlAgentDemoApplication : Started TtlAgentDemoApplication in 18.986 seconds (JVM running for 23.442)
2021-02-02 18:15:02.917 INFO [arthas-command-execute] TtlExecutorTransformlet: insert code before method public void execute(Runnable) of class java.util.concurrent.ScheduledThreadPoolExecutor:
$1 = com.alibaba.ttl.TtlRunnable.get($1, false, true);
com.alibaba.ttl.threadpool.agent.internal.transformlet.impl.Utils.setAutoWrapperAttachment($1);
这个错(
class redefinition failed: attempted to change superclass or interfaces
) 是因为RetransformClasses
方法的限制:不能增加、删除、和修改方法签名,或者修饰符,还有继承关系。具体原因应该是
TtlAgent
中让每个被包装的线程池都实现了TtlAgentEnhanced
。 我去掉这个实现该接口的逻辑,发现是可以做到的。
COOL! 👍 ♥️ @TuitaKing
这个是用来避免父类再被包装?
用来标识,对于ThreadPoolExecutor
,目前并不是一定需要。
但对于ForkJoinPool
会有加类字段 操作。
修改类在ForkJoinTtlTransformlet
@TuitaKing 可以看到你非常了解Java Agent
与 TTL的整个过程与实现。 👍 🚀
你有没有兴趣 方便来实验 与 实现一下:TTL在类加载后的增强功能? @TuitaKing 😁 ❤️
(ForkJoinPool
的实现方案可能有些复杂需要挖)
PS:最近也在加 TTL的扩展点功能(Extension Transformlet
),刚发了v2.13.0-Beta1
;
最新的代码还没有完全稳定,也带来了一些实现过程的额外麻烦。🙏
我可以尝试下😁。不过我还处于学习状态,可能花的时间比较久,尽力而为🙏。
我尝试了几次修改已经加载的class。
但是RetransformClasses
的限制又点多,会去掉TTL
比如都实现相同的接口这类设计。
所以我觉得暂时只能使用规范(TTL
在前)来避开上面的问题。
后续我会使用bytebuddy
进行尝试,但是和现在的设计区别较大。
望知晓。有点抱歉
我尝试了几次修改已经加载的class。
但是
RetransformClasses
的限制又点多,会去掉TTL
比如都实现相同的接口这类设计。 所以我觉得暂时只能使用规范(TTL
在前)来避开上面的问题。后续我会使用
bytebuddy
进行尝试,但是和现在的设计区别较大。望知晓。有点抱歉
收到。辛苦了~ ♥️ @TuitaKing
期待ing~
最近发现
TTL agent
与其他agent
共用时,TTL agent
必须放在最前面; 否则会导致jdk线程池修饰失败,TTL失去作用。比如与
Skywalking Agent
、Promethues
等共用(未修改agent jar
文件名)。请问作者有遇到类似问题吗?