Open stashslash opened 5 years ago
It looks like the timeout happens during the HotelBookingService is sleeping ? And the compensate method was invoked before the booking is finished. So it maybe need to add a lock to check if the transaction is cancelled.
void order(HotelBooking booking) {
...
locker.lock();
if (! isCancelled) {
doBooking();
} else {
throw Exception("transaction is cancelled");
}
locker.unlock();
}
void cancel(HotelBooking booking) {
locker.lock();
isCancelled = true;
doCancel();
locker.unlock();
}
@zhfeng thx for reply. like you said, and i know add a lock should works. but my confusion is if i should add a lock check on each compensate method ? or it has a better way to solve that, like servicecmob-saga help me do it.
OK, I see. The alpha server needs to check the TxEndedEvent is received before doing the compensate. would you mind to raise a JIRA for this user case ?
Thanks @stashslash
As it is caused by the timeout, Alpha don't have chance to know if the compensable method is finished or not. Alpha have to abort the transaction by invoke the compensation method.
I don't think a lock could resolve this issue perfectly, because of the timeout could be caused by lot of reason. Such as the if the network connection is broken, Alpha cannot receive the TxEndedEvent, but the compensable method is finished.
I'm think about let the Omega take control of timeout checking, it could be much better for us to handle the timeout situation.
it sounds great! and i raised SCB-1057
@WillemJiang I am not sure how the Omega can handle this timeout situation ? The Alpha need to guarantee that the compensate method should be invoked after the local transaction is finished, is it right ?
@zhfeng Alpha cannot know that, how about LRA? Does the Coordinator know if the action is timeout.
@WillemJiang I think the current Narayana LRA implements the timeout at the coordinate side.
So it maybe has the similar situation that the compensate method could be invoked before the business is done. And I think it is not clear described at the LRA spec timeout.
I think we can bring this case to the LRA issues.
@stashslash I proposal to update the booking demo at this timeout situation. Can you check this could resolve your problem ?
As we discussed, the coordinator cannot guarantee that the compensate method will be invoked after the business is done.
@zhfeng My question is what does doEnd do when the timeout happens. If the Omega can cancel the invocation that could be great.
@WillemJiang it cancels the transaction when timeout happens and this is done at the coordinator side.
@stashslash @WillemJiang it looks like we need to indicate in the cancel method that the compensate is ongoing. Now I think it might throw the Exception which could cause the alpha server to re-invoke this compensate method later.
void cancel(HotelBooking booking) throws Exception {
Integer id = booking.getId();
if (bookings.containsKey(id)) {
bookings.get(id).cancel();
} else {
throw new Exception("can not cancel " + booking.getId());
}
}
Also we could consider to add the status interface which reports to the alpha server.
+1 for adding the status interface. BTW, I think we can let the omega check the type of exception to decide if it need to retry it again.
@zhfeng if Omega could handle status,we just need bookings.get(id).cancel();
in compensate method, right?
@stashslash I think the compensate still needs to return the result to indicate if it has finished the work.
CompensateStatus cancel(HotelBooking booking) {
Integer id = booking.getId();
try {
if (bookings.containsKey(id)) {
bookings.get(id).cancel();
return COMPENSATE_OK; // we finish the compensate work
} else {
return COMPENSATING; // we can not find the booking record to be compensated and it's maybe ongoing, so it needs to be re-invoked later
}
} catch (Exception e) {
return COMPENSATE_FAIL; // sorry, we can not do the compensate work because there is something throws the exception
}
}
The alpha will check the status by asking the Omega to report it periodically until the compensate is OK or FAIL.
@zhfeng im not sure compensate how to catch exception from compensable, because i think only compensable itself can know if happen exception and handle COMPENSATING status by itself much better.
so, like you snippets, if something wrong happens in compensable, it'll hanging COMPENSATING status, right?
and why not let Omega to report COMPENSATING status around invoke compensable method, like this i think i just need bookings.get(id).cancel();
in compensate method.
@stashslash I agree with you that the Omega reports the COMPENSATING STATUS during the compensate method invoked. So if we only have the bookings.get(id).cancel() in the compensate method, let's see what could happen
1) everything goes smooth and the booking record has been inserted, and this is a good situation that the cancel() can returns OK. The Omega will report COMPENSATE_OK.
2) the same as above but the cancel() returns FAIL or throws the exception to indicate this status. The Omega will report COMPENSATE_FAIL.
3) The booking record has not been inserted, and the bookings.get(id).cancel() will throw the NPE exception I assume. The Omega will catch this exception and report COMPENSATING STATUS as we expect the alpha will re-invoke this method later. And the booking record will be inserted at last, and the cancel() will return OK or FAIL finally. The Omega will report COMPENSATE_OK or FAIL.
4) The booking record inserts failed due to the database errors and the Omega will catch this exception and mark this saga transaction is FAIL.
So the Omega will be able to know the return values or the exception from the cancel() method when reporting the status. It does make sense if we can well define these contracts between the framework and the applications.
yes, its my fault, i think some actions like bookings.get(id).cancel();
will throws exception but actually not.
you're right and its clear, @zhfeng thx.
so should i close this issue?
i test the
saga-spring-demo
if i modify the
HotelBookingService
add some sleep code, like:then, i call the booking rest api, and got the booking result from hotel service and car service, the result looks like:
[{"name":"test","amount":2,"confirmed":true,"cancelled":false}]
[{"name":"test","amount":2,"confirmed":false,"cancelled":true}]
it's looks like car service compensated success, but hotel service compensated failure.
so, did i missing anything?