aws-powertools / powertools-lambda

MIT No Attribution
73 stars 4 forks source link

RFC: [SQS Batch Processing] Ability to move Message from Batch to DLQ when certain types of exception occurs #29

Open pankajagrawal16 opened 3 years ago

pankajagrawal16 commented 3 years ago

Key information

Summary

During batch processing of SQS messages, there can be messages in the batch which fails processing because of reasons for which user will not want them to retry it but move those to a DLQ associated with SQS queue or delete it entirely. Example might be where a message is failing business validation, and it won't make sense to let it retry until it expires, rather we can simply move such message directly to a DLQ.

We could enhance SQL Batch processing to accept a list of Exception/Errors. If those exceptions occur during message processing via SqsMessageHandler for Java or via record_handler in python, utility can take care of moving such message to DLQ directly or delete it entirely based on a config param, instead of moving them back to queue.

Motivation

This is a fairly common use-case. This will take away all the custom logic that user need to build themselves and let user focus on writing business logic of processing the SQS message instead.

Proposal

During batch processing of SQS messages, there can be messages in the batch which fails processing because of reasons for which user will not want them to retry it but move those to a DLQ associated with SQS queue or delete it entirely. Example might be where a message is failing business validation, and it won't make sense to let it retry until it expires, rather we can simply move such message directly to a DLQ.

We could enhance SQL Batch processing to accept a list of Exception/Errors. If those exceptions occur during message processing via SqsMessageHandler for Java or via record_handler in python, utility can take care of moving such message to DLQ directly or delete it entirely based on a config param, instead of moving them back to queue.

So basically, accepting a list of exceptions/errors and a new flag if such a message should be deleted or moved to DLQ in the api contract or the annotation/decorator. Default could be to move to DLQ is one exists for the SQS queue.

If this feature should be available in other runtimes (e.g. Python), how would this look like to ensure consistency?

For Python version, since it supports similar batch processing utility with similar UX via decorator and APIs, same capability could be added to it as well.

User Experience

How would customers use it?

For Java:

public class PartialBatchPartialFailureHandler implements RequestHandler<SQSEvent, String> {
    @Override
    @SqsBatch(value = InnerMessageHandler.class, nonRetryableExceptions = {IllegalStateException.class})
    public String handleRequest(final SQSEvent sqsEvent,
                                final Context context) {
        return "Success";
    }

    private class InnerMessageHandler implements SqsMessageHandler<Object> {

        @Override
        public String process(SQSMessage message) {
            if ("some business logic validation".equals("false")) {
                throw new IllegalStateException("2e1424d4-f796-459a-8184-9c92662be6da");
            }

            return "Success";
        }
    }
}
public class PartialBatchPartialFailureHandler implements RequestHandler<SQSEvent, String> {
    @Override
    @SqsBatch(value = InnerMessageHandler.class, nonRetryableExceptions = {IllegalStateException.class}, deleteNonRetryableMessageFromQueue = true)
    public String handleRequest(final SQSEvent sqsEvent,
                                final Context context) {
        return "Success";
    }

    private class InnerMessageHandler implements SqsMessageHandler<Object> {

        @Override
        public String process(SQSMessage message) {
            if ("some business logic validation".equals("false")) {
                throw new IllegalStateException("2e1424d4-f796-459a-8184-9c92662be6da");
            }

            return "Success";
        }
    }
}

Similar support can be made to higher level api supported by utility:

public class PartialBatchPartialFailureHandler implements RequestHandler<SQSEvent, Object> {
    @Override
    public Object handleRequest(final SQSEvent sqsEvent,
                                final Context context) {

        return SqsUtils.batchProcessor(sqsEvent, message -> {
                    if ("some business logic validation".equals("false")) {
                        throw new IllegalStateException("2e1424d4-f796-459a-8184-9c92662be6da");
                    }

                    return "Success";
                },
                IllegalStateException.class);
    }

So in above examples, if IllegalStateException is thrown from handler while processing the message, then that message will be automatically moved to a DLQ or be deleted based on flag value of deleteNonRetryableMessageFromQueue. By default, it will attempt to move it to DLQ if one exists.

Any configuration or corner cases you'd expect? NA

Demonstration of before and after on how the experience will be better

Refer summary above. Today all the logic of deciding to move DQL or Deleting such message has to be done by users writing alot of custom code around it.

Drawbacks

Increases complexity of the utilty and more code to maintain?

No, since we already depend on SQS client today. Its just additional functionality within utility.

Rationale and alternatives

Unresolved questions

Optional, stash area for topics that need further development e.g. TBD

pankajagrawal16 commented 3 years ago

this is now available for Java https://awslabs.github.io/aws-lambda-powertools-java/utilities/batch/#move-non-retryable-messages-to-a-dead-letter-queue.

https://github.com/awslabs/aws-lambda-powertools-java/releases/tag/v1.7.3