In a DLQ situation, it would be awesome if a monitor-system could upon the DLQ instantly send out a "broadcast" on the Mats fabric for all messages that have been part of the flow, to store these to aid in debugging and resolution.
The idea is that all Stage Processors would then add their incoming messages to a central memory store in the MatsFactory - where the MatsFactory would try to keep a good set of messages for a reasonable time until it "should have DLQed", e.g. 10 minutes - or for a max of e.g. 100 MB. This would purely be best effort; If the max was reached, it would ditch messages. Large messages would pose a problem, so some "sanity" would have to be implemented, e.g. that large messages simply was ditched, or max 1 outstanding per stage, or similar. (In my understanding, this might not pose too big of a problem: Large messages are typically the result of some query, not a part of a "process this transaction" flow - where the latter are the ones that make for difficult/interesting DLQs, and the former (queries) both typically doesn't have complex business logic (and thus don't DLQ), and aren't really important wrt. debugging).
It would be possible to include a "this flow is finished" broadcast (when a stage doesn't have an outgoing message, and itself finishes ok), to empty out the stores on the different MatsFactories for that flowId. But this might not be worth the chatter, compared to just "best effort" and some max time limit.
For debugging, one could then "step through" the entire flow, from initiation to and including the DLQ point. (This is also the intention of the KeepTrace.FULL, but that solution is pretty high overhead in that absolutely all flows keeps all info about previous steps on the wire. Unless you explicitly downgrade to a lower KeepTrace level. At time of writing, the COMPACT is default, which do not give the actual message contents.)
For resolution, one could then choose to restart the flow from an earlier point by simply sending the older / a previous message back onto its queue, instead of reissuing the actual DLQ.
In a DLQ situation, it would be awesome if a monitor-system could upon the DLQ instantly send out a "broadcast" on the Mats fabric for all messages that have been part of the flow, to store these to aid in debugging and resolution.
The idea is that all Stage Processors would then add their incoming messages to a central memory store in the MatsFactory - where the MatsFactory would try to keep a good set of messages for a reasonable time until it "should have DLQed", e.g. 10 minutes - or for a max of e.g. 100 MB. This would purely be best effort; If the max was reached, it would ditch messages. Large messages would pose a problem, so some "sanity" would have to be implemented, e.g. that large messages simply was ditched, or max 1 outstanding per stage, or similar. (In my understanding, this might not pose too big of a problem: Large messages are typically the result of some query, not a part of a "process this transaction" flow - where the latter are the ones that make for difficult/interesting DLQs, and the former (queries) both typically doesn't have complex business logic (and thus don't DLQ), and aren't really important wrt. debugging). It would be possible to include a "this flow is finished" broadcast (when a stage doesn't have an outgoing message, and itself finishes ok), to empty out the stores on the different MatsFactories for that flowId. But this might not be worth the chatter, compared to just "best effort" and some max time limit.
For debugging, one could then "step through" the entire flow, from initiation to and including the DLQ point. (This is also the intention of the KeepTrace.FULL, but that solution is pretty high overhead in that absolutely all flows keeps all info about previous steps on the wire. Unless you explicitly downgrade to a lower KeepTrace level. At time of writing, the COMPACT is default, which do not give the actual message contents.)
For resolution, one could then choose to restart the flow from an earlier point by simply sending the older / a previous message back onto its queue, instead of reissuing the actual DLQ.