Open Richard-Yi opened 3 years ago
Thank you, I need some time to deal with this problem
This bug same as the JIRA issue SCB-2004 and has been fixed in #667
This bug same as the JIRA issue SCB-2004 and has been fixed in #667
sincerely Thanks for the reply.
@coolbeevip But sorry, I didn’t figure out that these two are the same problem. Could you please explain more about it. And is the problem fixed in 0.0.7-SNAPSHOT
?
This bug same as the JIRA issue SCB-2004 and has been fixed in #667
sincerely Thanks for the reply. @coolbeevip But sorry, I didn’t figure out that these two are the same problem. Could you please explain more about it. And is the problem fixed in
0.0.7-SNAPSHOT
?
You can try build 0.7.0-SNAPSHOT with branch master
This bug same as the JIRA issue SCB-2004 and has been fixed in #667
sincerely Thanks for the reply. @coolbeevip But sorry, I didn’t figure out that these two are the same problem. Could you please explain more about it. And is the problem fixed in
0.0.7-SNAPSHOT
?You can try build 0.7.0-SNAPSHOT with branch master
@coolbeevip well, I tried 0.7.0-SNAPSHOT with branch master
, the problem still exists and the exception stack trace is the same.
This bug same as the JIRA issue SCB-2004 and has been fixed in #667
sincerely Thanks for the reply. @coolbeevip But sorry, I didn’t figure out that these two are the same problem. Could you please explain more about it. And is the problem fixed in
0.0.7-SNAPSHOT
?You can try build 0.7.0-SNAPSHOT with branch master
@coolbeevip well, I tried
0.7.0-SNAPSHOT with branch master
, the problem still exists and the exception stack trace is the same.
Hmm. I recently released docker https://hub.docker.com/repository/docker/coolbeevip/servicecomb-pack/tags?page=1&ordering=last_updated, can you try this again?
docker run \
-p 8080:8080 \
-p 8090:8090 \
-p 5432:5432 \
coolbeevip/servicecomb-pack:0.7.0-all-in-one
Testing scenarios
I was testing how ServiceComb behave when the global Tx is timed out.
There were 3 services:
The order service calls the account service and the storage service in turn.
The testing scenarios is as below:
Order Service set the global Tx timout to
10s
via@SagaStart(timeout = 10)
storage service will simulate a timeout of 10 seconds via
Thread.sleep()
code
order serivce
account service
storage service
Problem Description
In the above test case, when the global transaction times out, alpha-server will continuously throw NPE exceptions. The exception log is as follows:
The alpha UI show the suspended tx reaches 45.46K+...
My exploration of the Problem
I debug the alpha in my local env and make the break point according to the error stacktrace. Then I found what cause the probleam.
when alpha found the global tx is timed out, the global tx state will be set to
SUSPEND
,and will stop the sagaActor.before stopping, it will save the GlobalTransaction to the es. The NPE happens when building the subTransaction.
although the global tx is
SUSPEND
, the sub tx is still active and its endtime is null, and then cause NPE here.