During batch processing of SQS messages, there can be messages in the batch which fails processing because of reasons for which user will not want them to retry it but move those to a DLQ associated with SQS queue or delete it entirely. Example might be where a message is failing business validation, and it won't make sense to let it retry until it expires, rather we can simply move such message directly to a DLQ.
We could enhance SQL Batch processing to accept a list of Exception/Errors. If those exceptions occur during message processing via SqsMessageHandler for Java or via record_handler in python, utility can take care of moving such message to DLQ directly or delete it entirely based on a config param, instead of moving them back to queue.
Motivation
This is a fairly common use-case. This will take away all the custom logic that user need to build themselves and let user focus on writing business logic of processing the SQS message instead.
Proposal
During batch processing of SQS messages, there can be messages in the batch which fails processing because of reasons for which user will not want them to retry it but move those to a DLQ associated with SQS queue or delete it entirely. Example might be where a message is failing business validation, and it won't make sense to let it retry until it expires, rather we can simply move such message directly to a DLQ.
We could enhance SQL Batch processing to accept a list of Exception/Errors. If those exceptions occur during message processing via SqsMessageHandler for Java or via record_handler in python, utility can take care of moving such message to DLQ directly or delete it entirely based on a config param, instead of moving them back to queue.
So basically, accepting a list of exceptions/errors and a new flag if such a message should be deleted or moved to DLQ in the api contract or the annotation/decorator. Default could be to move to DLQ is one exists for the SQS queue.
If this feature should be available in other runtimes (e.g. Python), how would this look like to ensure consistency?
For Python version, since it supports similar batch processing utility with similar UX via decorator and APIs, same capability could be added to it as well.
User Experience
How would customers use it?
For Java:
public class PartialBatchPartialFailureHandler implements RequestHandler<SQSEvent, String> {
@Override
@SqsBatch(value = InnerMessageHandler.class, nonRetryableExceptions = {IllegalStateException.class})
public String handleRequest(final SQSEvent sqsEvent,
final Context context) {
return "Success";
}
private class InnerMessageHandler implements SqsMessageHandler<Object> {
@Override
public String process(SQSMessage message) {
if ("some business logic validation".equals("false")) {
throw new IllegalStateException("2e1424d4-f796-459a-8184-9c92662be6da");
}
return "Success";
}
}
}
public class PartialBatchPartialFailureHandler implements RequestHandler<SQSEvent, String> {
@Override
@SqsBatch(value = InnerMessageHandler.class, nonRetryableExceptions = {IllegalStateException.class}, deleteNonRetryableMessageFromQueue = true)
public String handleRequest(final SQSEvent sqsEvent,
final Context context) {
return "Success";
}
private class InnerMessageHandler implements SqsMessageHandler<Object> {
@Override
public String process(SQSMessage message) {
if ("some business logic validation".equals("false")) {
throw new IllegalStateException("2e1424d4-f796-459a-8184-9c92662be6da");
}
return "Success";
}
}
}
Similar support can be made to higher level api supported by utility:
public class PartialBatchPartialFailureHandler implements RequestHandler<SQSEvent, Object> {
@Override
public Object handleRequest(final SQSEvent sqsEvent,
final Context context) {
return SqsUtils.batchProcessor(sqsEvent, message -> {
if ("some business logic validation".equals("false")) {
throw new IllegalStateException("2e1424d4-f796-459a-8184-9c92662be6da");
}
return "Success";
},
IllegalStateException.class);
}
So in above examples, if IllegalStateException is thrown from handler while processing the message, then that message will be automatically moved to a DLQ or be deleted based on flag value of deleteNonRetryableMessageFromQueue. By default, it will attempt to move it to DLQ if one exists.
Any configuration or corner cases you'd expect?
NA
Demonstration of before and after on how the experience will be better
Refer summary above. Today all the logic of deciding to move DQL or Deleting such message has to be done by users writing alot of custom code around it.
Drawbacks
Increases complexity of the utilty and more code to maintain?
No, since we already depend on SQS client today. Its just additional functionality within utility.
Rationale and alternatives
What other designs have been considered? Why not them?
What is the impact of not doing this? Customer are forced to write and maintian such custom logic themselves.
Unresolved questions
Optional, stash area for topics that need further development e.g. TBD
Key information
Summary
During batch processing of SQS messages, there can be messages in the batch which fails processing because of reasons for which user will not want them to retry it but move those to a DLQ associated with SQS queue or delete it entirely. Example might be where a message is failing business validation, and it won't make sense to let it retry until it expires, rather we can simply move such message directly to a DLQ.
We could enhance SQL Batch processing to accept a list of Exception/Errors. If those exceptions occur during message processing via
SqsMessageHandler
for Java or viarecord_handler
in python, utility can take care of moving such message to DLQ directly or delete it entirely based on a config param, instead of moving them back to queue.Motivation
This is a fairly common use-case. This will take away all the custom logic that user need to build themselves and let user focus on writing business logic of processing the SQS message instead.
Proposal
During batch processing of SQS messages, there can be messages in the batch which fails processing because of reasons for which user will not want them to retry it but move those to a DLQ associated with SQS queue or delete it entirely. Example might be where a message is failing business validation, and it won't make sense to let it retry until it expires, rather we can simply move such message directly to a DLQ.
We could enhance SQL Batch processing to accept a list of Exception/Errors. If those exceptions occur during message processing via
SqsMessageHandler
for Java or viarecord_handler
in python, utility can take care of moving such message to DLQ directly or delete it entirely based on a config param, instead of moving them back to queue.So basically, accepting a list of exceptions/errors and a new flag if such a message should be deleted or moved to DLQ in the api contract or the annotation/decorator. Default could be to move to DLQ is one exists for the SQS queue.
If this feature should be available in other runtimes (e.g. Python), how would this look like to ensure consistency?
For Python version, since it supports similar batch processing utility with similar UX via decorator and APIs, same capability could be added to it as well.
User Experience
How would customers use it?
For Java:
Similar support can be made to higher level api supported by utility:
So in above examples, if
IllegalStateException
is thrown from handler while processing the message, then that message will be automatically moved to a DLQ or be deleted based on flag value ofdeleteNonRetryableMessageFromQueue
. By default, it will attempt to move it to DLQ if one exists.Any configuration or corner cases you'd expect? NA
Demonstration of before and after on how the experience will be better
Refer summary above. Today all the logic of deciding to move DQL or Deleting such message has to be done by users writing alot of custom code around it.
Drawbacks
Increases complexity of the utilty and more code to maintain?
No, since we already depend on SQS client today. Its just additional functionality within utility.
Rationale and alternatives
Unresolved questions