apache / skywalking

APM, Application Performance Monitoring System
https://skywalking.apache.org/
Apache License 2.0
23.93k stars 6.53k forks source link

rocketmq plugin of agent throw NullPointException #5311

Closed naah69 closed 4 years ago

naah69 commented 4 years ago

Please answer these questions before submitting your issue.


Question

i use sync send message in rocketmq, it throw error and agent did't work ,don't send data to collector


ERROR 2020-08-14 14:47:48:750 AsyncSenderExecutor_1 InstMethodsInter : class[class org.apache.rocketmq.client.trace.AsyncTraceDispatcher$AsyncAppenderRequest$1] before method[onException] intercept failure 
java.lang.NullPointerException
        at org.apache.skywalking.apm.plugin.rocketMQ.v4.OnExceptionInterceptor.beforeMethod(OnExceptionInterceptor.java:42)
        at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:76)
        at org.apache.rocketmq.client.trace.AsyncTraceDispatcher$AsyncAppenderRequest$1.onException(AsyncTraceDispatcher.java)
        at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl$4.run(DefaultMQProducerImpl.java:512)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Requirement or improvement

wu-sheng commented 4 years ago

I recommend you debug locally, and check what is going on. No plugin test about RocketMQ plugin yet. If could build one by following plugin development doc.

xbkaishui commented 4 years ago

please assign me

xbkaishui commented 4 years ago

Hi @naah69 can you provide more detail about this bug? how to reproduce this bug (code) ? thanks

naah69 commented 4 years ago

sure, i guess that the error was caused by syncSend ,there is no asyncSend in my application.but the error was throw by org.apache.rocketmq.client.trace.AsyncTraceDispatcher$AsyncAppenderRequest

i use Aliyun Distributed Tracing to monitor my application.

the application use v8.1.0 of skywalking agent to send info to Aliyun Distributed Tracing。

myagent config:

agent.service_name=seedserver
agent.sample_n_per_3_secs=${SW_AGENT_SAMPLE:3}
collector.backend_service=xxxxxxx
agent.authentication=xxxxxxxxxxxxx
logging.file_name=${SW_LOGGING_FILE_NAME:skywalking-api.log}
logging.level=${SW_LOGGING_LEVEL:INFO}

dependencies of spirngboot(jdk8):

  <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.3.0.RELEASE</version>
    </parent>

dependencies of myrocketmq:

        <dependency>
            <groupId>org.apache.rocketmq</groupId>
            <artifactId>rocketmq-spring-boot-starter</artifactId>
            <version>2.1.0</version>
            <exclusions>
                <exclusion>
                    <artifactId>commons-codec</artifactId>
                    <groupId>commons-codec</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-collections</artifactId>
                    <groupId>commons-collections</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-lang3</artifactId>
                    <groupId>org.apache.commons</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-logging</artifactId>
                    <groupId>commons-logging</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>fastjson</artifactId>
                    <groupId>com.alibaba</groupId>
                </exclusion>
            </exclusions>
        </dependency>

part code of rocketmq:

yaml config:

rocketmq:
  name-server: http://xxxxxxx.cn-hangzhou.mq-internal.aliyuncs.com:8080
  producer:
    access-key: xxxxxxx
    secret-key: xxxxxxx
    group: GID_SEEDSERVER
  consumer:
    access-key: xxxxxxx
    secret-key: xxxxxxx

consumer:

@Slf4j
@Service
@RocketMQMessageListener(
        topic = "SEEDSERVER_EXPORT",
        consumerGroup = "GID_SEEDSERVER_EXPORT_" + "${spring.profiles.active}",
        consumeTimeout = 24 * 60 * 60 * 1000,
        selectorType = SelectorType.TAG,
        selectorExpression = "${spring.profiles.active}"
)

provider:

    @Autowired
    private RocketMQTemplate rocketMQTemplate;
    @Value("${spring.profiles.active}")
    private String profiles;

//sync send
 rocketMQTemplate.syncSend("SEEDSERVER_EXPORT:" + profiles, JSON.toJSONString(message));
xbkaishui commented 4 years ago

thanks, will check later

xbkaishui commented 4 years ago

hi @naah69 the bug is in the agent enhance logic. I need more time to check the logic of why intercept the interface

wu-sheng commented 4 years ago

I need more time to check the logic of why intercept the interface

How does this happen? Does rocketMQ change the codebase?

xbkaishui commented 4 years ago

Yes, the plugin of rocketMQ 4.x only test on version 4.1 , @naah69 rocketMQ used version 4.6 client, in version4.6 the interface of SendCallback has multi implement, which caused the problem

wu-sheng commented 4 years ago

Got it, when you try to provide new plugin implement for 4.6(or any version newer), remember to add witness class for the previous versions, and make our plugins supporting all of them.

xbkaishui commented 4 years ago

Hi @naah69, For your case the quick fix is downgrade the client version , since your rocketMQ server version is 4.3, you can downgrade the rocketmq-spring-boot-starter version from 2.1.0 to 2.0.1, I test is in my local env. it is ok. you can try and let me know

xbkaishui commented 4 years ago

Got it, when you try to provide new plugin implement for 4.6(or any version newer), remember to add witness class for the previous versions, and make our plugins supporting all of them.

Yes, got it

xbkaishui commented 4 years ago

Hi @wu-sheng, I went deep into the source code of RocketMQ, I think there is no need to add new plugin logic. this issue is caused by wrong client and server version match. the exception happened as belows: For the version newer than 4.3, RocketMQ added new AsyncTraceDispatcher class which collect metrics and send to broker. when use old server, it will not found the special topic in broker, then it throws a exception. I think it will be ok for the plugin

wu-sheng commented 4 years ago

when use old server, it will not found the special topic in broker, then it throws a exception.

A little confused. I was thinking, this error only happens in the plugin codes, isn't it?

wu-sheng commented 4 years ago
        at org.apache.skywalking.apm.plugin.rocketMQ.v4.OnExceptionInterceptor.beforeMethod(OnExceptionInterceptor.java:42)
        at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:76)
        at org.apache.rocketmq.client.trace.AsyncTraceDispatcher$AsyncAppenderRequest$1.onException(AsyncTraceDispatcher.java)

This is an onException, so, the topicId could be unavailable?

xbkaishui commented 4 years ago

Yes, It only happens in the plugin, This is because before the RocketMQ client send the metric data, it will check exist of the specified topic. for this case, the check is failed, and direct go to the OnException logic, which caused the Nullpointer

xbkaishui commented 4 years ago
        at org.apache.skywalking.apm.plugin.rocketMQ.v4.OnExceptionInterceptor.beforeMethod(OnExceptionInterceptor.java:42)
        at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:76)
        at org.apache.rocketmq.client.trace.AsyncTraceDispatcher$AsyncAppenderRequest$1.onException(AsyncTraceDispatcher.java)

This is an onException, so, the topicId could be unavailable?

Yes, Old server can't have this topic

xbkaishui commented 4 years ago

Perhaps I can change the OnExceptionInterceptor logic, and ignore this null topic case, what do you think?

wu-sheng commented 4 years ago

Please correct me if I am wrong.

From my code reading(not debug), this issue should be caused by SendCallBackEnhanceInfo enhanceInfo = (SendCallBackEnhanceInfo) objInst.getSkyWalkingDynamicField();, which can't have the this field.

Could you check, why this field isn't set?

wu-sheng commented 4 years ago

My point is, we are not reading the topic from RocketMQ API, it was set somewhere else. And the SendCallBackEnhanceInfo missing usually means we missed to intercept an important entrance of codes.

xbkaishui commented 4 years ago

This field is set by MessageSendInterceptor beforeMethod, which enhance class org.apache.rocketmq.client.impl.MQClientAPIImpl,
like I said before, it was not called the MQClientAPIImpl.sendMessage and direct go to exception logic

wu-sheng commented 4 years ago

Oh, you mean, the codes don't access the RocketMQ? It used to be causing error?

Sorry, I am not familiar with the RocketMQ thing. Just try to understand from your description.

wu-sheng commented 4 years ago

If MQClientAPIImpl isn't executed, why? Is there another API for MQ client?

Here is the principle. If the user codes are really accessing RocketMQ server, then, we should track them, even some APIs are not available. If we can't do this in some cases, then, this is a plugin bug.

xbkaishui commented 4 years ago

Yes, no problem, I am new to RocketMQ too, the flow is like this step1 : producer -> send user data -> broker

step2: AsyncTraceDispatcher collect metric -> async send to broker

The error was happened in step2, actually it is not affect the normal logic, just in the collect metric stage.

xbkaishui commented 4 years ago

You can check in https://hub.fastgit.org/apache/rocketmq/blob/master/client/src/main/java/org/apache/rocketmq/client/impl/producer/DefaultMQProducerImpl.java#L558

It is not go to the send logic just go to L685

wu-sheng commented 4 years ago

OK. Got your point, so there are 2 message sending.

From my understanding, this should be traced ideally. If you want to give a try because this actually is an RPC. Of course, you also could skip in this case, but this could hide some bugs in the future. Because my guess, you will skip the logic if enhanceInfo == null.

wu-sheng commented 4 years ago

This exception stack shows the LOC is this at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl$4.run(DefaultMQProducerImpl.java:512), which is a sending message exception,

https://github.com/apache/rocketmq/blob/master/client/src/main/java/org/apache/rocketmq/client/impl/producer/DefaultMQProducerImpl.java#L512

xbkaishui commented 4 years ago

Sure, It is not easy to implement this, I perfer we can talk in qq.

wu-sheng commented 4 years ago

Sure, It is not easy to implement this, I perfer we can talk in qq.

You could DM me, but after the discussion, you have to write the conclusion back here for all others to track.

xbkaishui commented 4 years ago

after talked with wu-sheng, the solution is to compatible with this scenario, when the enhanceInfo is null, just create a local span with default topic name "no-topic"

xbkaishui commented 4 years ago

this bug is also same with #3848 , please close it as duplicate

wu-sheng commented 4 years ago

Linked, it will close when the PR merged. You could use the keyword, resolve, in the PR next time. Then GitHub will do this automatically.

xbkaishui commented 4 years ago

Ok, thanks

naah69 commented 4 years ago

Hi @naah69, For your case the quick fix is downgrade the client version , since your rocketMQ server version is 4.3, you can downgrade the rocketmq-spring-boot-starter version from 2.1.0 to 2.0.1, I test is in my local env. it is ok. you can try and let me know

3q,i will try it y

naah69 commented 4 years ago

Hi @naah69, For your case the quick fix is downgrade the client version , since your rocketMQ server version is 4.3, you can downgrade the rocketmq-spring-boot-starter version from 2.1.0 to 2.0.1, I test is in my local env. it is ok. you can try and let me know

OMG.maybe i just can wait for 8.2.0 agent,becaust i use new feature consumeTimeout in annotation of rocketmq-spring-boot-starter-2.1.0.

it is not support in rocketmq-spring-boot-starter-2.0.1.

xbkaishui commented 4 years ago

To fix the NullPointer, You can also upgrade your server version greater than 4.4

naah69 commented 4 years ago

in production envirements, we use rocketmq of aliyun version,so i don't know which corresponding version of open source version ,i had submit work order to ask version