centiservice / mats3

Mats3: Message-based Asynchronous Transactional Staged Stateless Services
https://mats3.io/
Other
63 stars 7 forks source link

Retry initiation message sends upon "VERY BAD!" scenario. #36

Open stolsvik opened 3 years ago

stolsvik commented 3 years ago

As mentioned in (now closed) centiservice/mats#27, "Initiations - VERY BAD! Reconsider transaction demarcation in initiations", with an initiation, there is really no JMS transactional demarcation going on: You have not consumed a message - the only point is to produce one or several messages. Thus, if the actual sending of the message fails, you could just try to send it again. (Read centiservice/mats#27 now!)

There is a slight issue with possible double-sending: If the initial attempt actually went through, but you didn't get the "TCP packets" from the MQ informing you about this, you could now possibly send multiple identical messages. This must be handled on the receiving side. This might imply that such retrying logic would have to be opt-in.

Also, there is an issue with the DB commit: The very point about this situation is that the DB commit has gone through, but you have not gotten the messages on their way. The messages now reside only in memory. You might loose power immediately afterwards. Thus, you'd really want to notify the invoker immediately, so that he could possibly issue compensating transactions to get back to a correct state (i.e. no jobs were allocated after all). However, if you now are going into a retry-cycle, you do not want to exit out just yet.

Also, there is an issue with logging: To be prudent about the situation occurring, you'd like to output a log-line that documents the problem immediately. This is because "all bets are off" at this point: We are only holding the message in memory. You might loose power right after, and then those messages really are gone forever. You'd then possibly might want to output a "Possible VERY BAD!", that after retrying for e.g. 7 seconds, you'd either output a "Cancelled VERY BAD!" if you got the message through, or do an "Actual VERY BAD!" if you gave up.

These problems very much point to the outbox pattern (#77) really being the way to go here, as that offloads the message storage to the database on the same commit as the database changes happened.

However, a rather simple retrying logic within mats itself would probably realistically alleviate very much of the actual problems occurring with initiations, since dropped MQ connections probably more often is due to booted MQ broker, rather than disastrous data center crashes.