Open nbelzer opened 4 years ago
While I discuss using the message broker above, it is optional and depends on our assumptions about the system:
By using the emitting service to check whether the receiving service succeeds or fails (and call failure in case it fails) we are able to detect failures in other services and prevent these forever halted transactions (if we assume that two services will not fail at the same time). The message broker could provide an extra guarantee on top of this if we do not want to assume that two services never fail at the same time.
A quick summary of the different types of events:
Payment Service
PAYMENT_INITIATED
PAYMENT_RESERVED
PAYMENT_FAILURE
User Service
CREDITS_RESERVED
CREDITS_SUBTRACTED
CREDITS_FAILURE
Stock Service
STOCK_RESERVED
STOCK_SUBTRACTED
STOCK_FAILURE
Great work Nick! I agree to take the Choreagraphy approach. However I had some short notes/questions.
- Stock Service produces PRODUCT_OUT_OF_STOCK_EVENT;
- Both Order Service and Payment Service listen to the previous message: Payment Service refund the client Order Service set the order state as failed
@plammerts
You talked about reserving such as the stock service that reserves the stock for the transaction. Are we just updating the database immediately, such as subtracting stock? And in case of a failure, we just rollback the the database by updating it again, so adding the subtracted stock?
We would be adding an extra space (table) per 'thing' that can be reserved (credits or stock) that keeps track of reservations and when they are made (such that we can automatically release them again after some time). On our routes that show the amount of stock available you would return available = stock - reserved
for a specific item.
I became a bit confused about the message broker part as this belongs to the orchestration approach of SAGA. In the Choreagraphy approach, the services itself should listen to each other in a chain instead of through a message broker.
Yeah this is where I started moving away from the pattern to solve some problems I was seeing. I guess the exact thing I describe above is a mix between the two types.
And about your problem: ' when an event is emitted but not responded to '. Cant we just use a timeout for this?
Yes that is exactly what I was thinking to do.
After our discussion today I took a look at the different articles discussing Saga:
Below I've taken some notes on how this would work for the
payment
service (which should be extendable to theorder
service`.Saga notes
Example
The payment service needs to reserve stock and credits before subtracting them and completing the order.
Payment
receives the payment requestPayment
creates a payment entry with statusINITIATED
and creates an eventPAYMENT_INITIATED
with the order id and a transaction id (uuid4?) (not sure if we need the transaction id)User
receives eventPAYMENT_INITIATED
and reserves credits for the transaction, emitsCREDITS_RESERVED
event for the same transaction id.Stock
receives eventCREDITS_RESERVED
and reserves the stock for the transaction, emits theSTOCK_RESERVED
event for the same transaction id.Payment
receives eventSTOCK_RESERVED
and changes the payment status toRESERVED
. EmitsPAYMENT_RESERVED
event.User
receivesPAYMENT_RESERVED
and applies the reservation for the transaction, emitsCREDITS_SUBTRACTED
for the transaction.Stock
receives eventCREDITS_SUBTRACTED
and applies the stock reservation. EmitsSTOCK_SUBTRACTED
for the transaction.Payment
receives the eventSTOCK_SUBTRACTED
and updates the status of the payment toPAID
.At any point there are also failed responses. For example:
User
Failure
sends theINSUFFICIENT_CREDITS
which is received by the payment service and stops the transaction.Stock
Failure
sends theINSUFFICIENT_STOCK
which is received by the user service (who cancels their reservation) and payment service (who returns failure on the transaction).User
Failure
Not sure how this could happen, but in case both reservations should be removed using aFAILURE
event for the transaction.Stock
Failure
Not sure how this could happen, but in case the user service receives the event and credits the payment back using aFAILURE
event for the transaction.Payment
Failure
Again not sure how this could happen, but in case we return the stock and credit using aFAILURE
event for the transaction.The only problem I see here is when an event is emitted but not responded to (2xx status code). The transaction will forever halt. This should be solvable using some sort of deliver-at-least once logic that waits for a 200 status code -> which could break if the node fails or is replaced -> This could be avoided by using a message broker that is highly available.
The only logic that is required on this message broker is to be highly available, send messages through their channels and wait for a response. If no response is given or an error we send a general
FAILURE
event for that transaction which should roll back the actions on other systems. This should make it so that unless a machine actually shuts down unexpectedly the system should stay consistent.The original request
An additional problem I see here is that because of this chain of messages we will need to keep the original request from the user to the payment service open until we either receive a failure or
STOCK_SUBTRACTED
event.In case of a failure of the payment service within this time we will not be able to let the user know the payment failed or succeeded.