Closed naah69 closed 4 years ago
I recommend you debug locally, and check what is going on. No plugin test about RocketMQ plugin yet. If could build one by following plugin development doc.
please assign me
Hi @naah69 can you provide more detail about this bug? how to reproduce this bug (code) ? thanks
sure, i guess that the error was caused by syncSend
,there is no asyncSend in my application.but the error was throw by org.apache.rocketmq.client.trace.AsyncTraceDispatcher$AsyncAppenderRequest
i use Aliyun Distributed Tracing to monitor my application.
the application use v8.1.0 of skywalking agent to send info to Aliyun Distributed Tracing。
agent.service_name=seedserver
agent.sample_n_per_3_secs=${SW_AGENT_SAMPLE:3}
collector.backend_service=xxxxxxx
agent.authentication=xxxxxxxxxxxxx
logging.file_name=${SW_LOGGING_FILE_NAME:skywalking-api.log}
logging.level=${SW_LOGGING_LEVEL:INFO}
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.3.0.RELEASE</version>
</parent>
<dependency>
<groupId>org.apache.rocketmq</groupId>
<artifactId>rocketmq-spring-boot-starter</artifactId>
<version>2.1.0</version>
<exclusions>
<exclusion>
<artifactId>commons-codec</artifactId>
<groupId>commons-codec</groupId>
</exclusion>
<exclusion>
<artifactId>commons-collections</artifactId>
<groupId>commons-collections</groupId>
</exclusion>
<exclusion>
<artifactId>commons-lang3</artifactId>
<groupId>org.apache.commons</groupId>
</exclusion>
<exclusion>
<artifactId>commons-logging</artifactId>
<groupId>commons-logging</groupId>
</exclusion>
<exclusion>
<artifactId>fastjson</artifactId>
<groupId>com.alibaba</groupId>
</exclusion>
</exclusions>
</dependency>
rocketmq:
name-server: http://xxxxxxx.cn-hangzhou.mq-internal.aliyuncs.com:8080
producer:
access-key: xxxxxxx
secret-key: xxxxxxx
group: GID_SEEDSERVER
consumer:
access-key: xxxxxxx
secret-key: xxxxxxx
@Slf4j
@Service
@RocketMQMessageListener(
topic = "SEEDSERVER_EXPORT",
consumerGroup = "GID_SEEDSERVER_EXPORT_" + "${spring.profiles.active}",
consumeTimeout = 24 * 60 * 60 * 1000,
selectorType = SelectorType.TAG,
selectorExpression = "${spring.profiles.active}"
)
@Autowired
private RocketMQTemplate rocketMQTemplate;
@Value("${spring.profiles.active}")
private String profiles;
//sync send
rocketMQTemplate.syncSend("SEEDSERVER_EXPORT:" + profiles, JSON.toJSONString(message));
thanks, will check later
hi @naah69 the bug is in the agent enhance logic. I need more time to check the logic of why intercept the interface
I need more time to check the logic of why intercept the interface
How does this happen? Does rocketMQ change the codebase?
Yes, the plugin of rocketMQ 4.x only test on version 4.1 , @naah69 rocketMQ used version 4.6 client, in version4.6 the interface of SendCallback has multi implement, which caused the problem
Got it, when you try to provide new plugin implement for 4.6(or any version newer), remember to add witness
class for the previous versions, and make our plugins supporting all of them.
Hi @naah69, For your case the quick fix is downgrade the client version , since your rocketMQ server version is 4.3, you can downgrade the rocketmq-spring-boot-starter version from 2.1.0 to 2.0.1, I test is in my local env. it is ok. you can try and let me know
Got it, when you try to provide new plugin implement for 4.6(or any version newer), remember to add
witness
class for the previous versions, and make our plugins supporting all of them.
Yes, got it
Hi @wu-sheng, I went deep into the source code of RocketMQ, I think there is no need to add new plugin logic. this issue is caused by wrong client and server version match. the exception happened as belows: For the version newer than 4.3, RocketMQ added new AsyncTraceDispatcher class which collect metrics and send to broker. when use old server, it will not found the special topic in broker, then it throws a exception. I think it will be ok for the plugin
when use old server, it will not found the special topic in broker, then it throws a exception.
A little confused. I was thinking, this error only happens in the plugin codes, isn't it?
at org.apache.skywalking.apm.plugin.rocketMQ.v4.OnExceptionInterceptor.beforeMethod(OnExceptionInterceptor.java:42)
at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:76)
at org.apache.rocketmq.client.trace.AsyncTraceDispatcher$AsyncAppenderRequest$1.onException(AsyncTraceDispatcher.java)
This is an onException
, so, the topicId
could be unavailable?
Yes, It only happens in the plugin, This is because before the RocketMQ client send the metric data, it will check exist of the specified topic. for this case, the check is failed, and direct go to the OnException logic, which caused the Nullpointer
at org.apache.skywalking.apm.plugin.rocketMQ.v4.OnExceptionInterceptor.beforeMethod(OnExceptionInterceptor.java:42) at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:76) at org.apache.rocketmq.client.trace.AsyncTraceDispatcher$AsyncAppenderRequest$1.onException(AsyncTraceDispatcher.java)
This is an
onException
, so, thetopicId
could be unavailable?
Yes, Old server can't have this topic
Perhaps I can change the OnExceptionInterceptor logic, and ignore this null topic case, what do you think?
Please correct me if I am wrong.
From my code reading(not debug), this issue should be caused by SendCallBackEnhanceInfo enhanceInfo = (SendCallBackEnhanceInfo) objInst.getSkyWalkingDynamicField();
, which can't have the this field.
Could you check, why this field isn't set?
My point is, we are not reading the topic from RocketMQ API, it was set somewhere else. And the SendCallBackEnhanceInfo
missing usually means we missed to intercept an important entrance of codes.
This field is set by MessageSendInterceptor beforeMethod, which enhance class org.apache.rocketmq.client.impl.MQClientAPIImpl,
like I said before, it was not called the MQClientAPIImpl.sendMessage and direct go to exception logic
Oh, you mean, the codes don't access the RocketMQ? It used to be causing error?
Sorry, I am not familiar with the RocketMQ thing. Just try to understand from your description.
If MQClientAPIImpl
isn't executed, why? Is there another API for MQ client?
Here is the principle. If the user codes are really accessing RocketMQ server, then, we should track them, even some APIs are not available. If we can't do this in some cases, then, this is a plugin bug.
step2: AsyncTraceDispatcher collect metric -> async send to broker
The error was happened in step2, actually it is not affect the normal logic, just in the collect metric stage.
You can check in https://hub.fastgit.org/apache/rocketmq/blob/master/client/src/main/java/org/apache/rocketmq/client/impl/producer/DefaultMQProducerImpl.java#L558
It is not go to the send logic just go to L685
OK. Got your point, so there are 2 message sending.
From my understanding, this should be traced ideally. If you want to give a try because this actually is an RPC. Of course, you also could skip in this case, but this could hide some bugs in the future. Because my guess, you will skip the logic if enhanceInfo == null
.
This exception stack shows the LOC is this at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl$4.run(DefaultMQProducerImpl.java:512)
, which is a sending message exception,
Sure, It is not easy to implement this, I perfer we can talk in qq.
Sure, It is not easy to implement this, I perfer we can talk in qq.
You could DM me, but after the discussion, you have to write the conclusion back here for all others to track.
after talked with wu-sheng, the solution is to compatible with this scenario, when the enhanceInfo is null, just create a local span with default topic name "no-topic"
this bug is also same with #3848 , please close it as duplicate
Linked, it will close when the PR merged. You could use the keyword, resolve, in the PR next time. Then GitHub will do this automatically.
Ok, thanks
Hi @naah69, For your case the quick fix is downgrade the client version , since your rocketMQ server version is 4.3, you can downgrade the rocketmq-spring-boot-starter version from 2.1.0 to 2.0.1, I test is in my local env. it is ok. you can try and let me know
3q,i will try it y
Hi @naah69, For your case the quick fix is downgrade the client version , since your rocketMQ server version is 4.3, you can downgrade the rocketmq-spring-boot-starter version from 2.1.0 to 2.0.1, I test is in my local env. it is ok. you can try and let me know
OMG.maybe i just can wait for 8.2.0 agent,becaust i use new feature consumeTimeout
in annotation of rocketmq-spring-boot-starter-2.1.0.
it is not support in rocketmq-spring-boot-starter-2.0.1.
To fix the NullPointer, You can also upgrade your server version greater than 4.4
in production envirements, we use rocketmq of aliyun version,so i don't know which corresponding version of open source version ,i had submit work order to ask version
Please answer these questions before submitting your issue.
Question
What do you want to know? why it throwed exception and what i will do can make it work.
Bug
Which version of SkyWalking, OS and JRE? agent 8.10 , openjdk 8 ,rocketmq4.3
Which company or project?
What happened? If possible, provide a way to reproduce the error. e.g. demo application, component version.
i use sync send message in rocketmq, it throw error and agent did't work ,don't send data to collector
Requirement or improvement