Closed xxd763795151 closed 2 years ago
Good catch! It is indeed a bug, do you have any ideas to fix it ?
Maybe we could replace to a separate thread pool when we use thenAcceptAsync @xxd763795151
Maybe we could replace to a separate thread pool when we use thenAcceptAsync @xxd763795151
Of course, this may be the quickest way, if its scope of influence is small enough.
Think it about another way: modify the [ public long headSlowTimeMills(BlockingQueue
Maybe we could replace to a separate thread pool when we use thenAcceptAsync @xxd763795151
Of course, this may be the quickest way, if its scope of influence is small enough. Think it about another way: modify the [ public long headSlowTimeMills(BlockingQueue q)]method. Iterate over the queue and find the first element that meets the conditions (or NULL)to calculate the time instead of the [peek()] method( think about the impact of concurrency).
Could you submit a pull request to fix the issue? I will review the code and help you merge.
Maybe we could replace to a separate thread pool when we use thenAcceptAsync @xxd763795151
Of course, this may be the quickest way, if its scope of influence is small enough. Think it about another way: modify the [ public long headSlowTimeMills(BlockingQueue q)]method. Iterate over the queue and find the first element that meets the conditions (or NULL)to calculate the time instead of the [peek()] method( think about the impact of concurrency).
Could you submit a pull request to fix the issue? I will review the code and help you merge.
Ok, I will verify the scheme and give you feedback later. @RongtongJin
Great find~ I would like to recommend you use condition iteration instead of creating another pool thread. There are too many blocking queues and executor service now. It's almost chaotic.~
Looking back on the fail-fast mechanism in the broker end what we've done previously is a little rude. I have been thinking about making optimization here. If you have better ideas, welcome to come up with your idea. For example, you could use better algorithms and data structures coming from Resilience4j...
Merged
BUG REPORT
When the request of send message resides in the sendThreadPoolQueue too long, the broker may occur "[TIMEOUT_CLEAN_QUEUE]broker busy, start flow control for a while, code as follow":
the default value of maxWaitTimeMillsInQueue is 200ms. We have set it`s value to 1000ms on the production environment, but, this quesiton still happens occasionally. We use rocketmq-exporter+prometheus+grafana monitoring the value of sendThreadPoolQueueHeadWaitTimeMills, however the value always is 0(Occasionally a very high value appears).It is not science!
When I debug the broker`s source code, I found that there are two types of data in the sendThreadPoolQueue.
If the header element of sendThreadPoolQueue is org.apache.rocketmq.broker.latency.FutureTaskExt, will computer the value of sendThreadPoolQueueHeadWaitTimeMills. Otherwise, it return 0. Look at the source code below:
Look at this line of code : BrokerFastFailure.castRunnable(peek);
The data of java.util.concurrent.CompletableFuture$UniAccept comes from(SendMessageProcessor.java):
They share the sendThreadPoolQueue. And the header element of sendThreadPoolQueue is java.util.concurrent.CompletableFuture$UniAccept most of the time.
linux mac os rocketmq 4.7.1 release
How I found these info in the sendThreadPoolQueue. Print it. Such as the code below:
this is print info:
And print the stack trace:
info as fllow: